XERV: AI Training Datasets

Bottom Line Up Front: XERV provides a library of high-quality, synthetic, and anonymized datasets (PURE, TART, GRAD) engineered to enhance chain-of-thought reasoning and academic performance in foundational AI models.

Dataset Name

Size / Format

Description

Access

PURE

Reasoning Chain-of-Thought Logic Mathematics Code

2.7M rows

Viewer available

Pretraining Universal Reasoning Engine. A massive, hyper-filtered, and structurally unified corpus designed to ignite chain-of-thought (CoT) reasoning capabilities in foundational large language models.

Xerv-AI (2026)

Download PURE Dataset

other

TART

Reasoning Chain-of-Thought Distillation Instruction-Tuning SFT

344k rows

Viewer available

Textual Answers & Reasoning Traces. A hyper-robust, massively aggregated, and meticulously filtered corpus of instruction-tuning records engineered for Supervised Fine-Tuning (SFT) and the distillation of advanced Chain-of-Thought (CoT) reasoning capabilities.

Xerv-AI (2026)

Download TART Dataset

apache-2.0, mit

GRAD

Academic Graduate STEM

1.93k samples

Viewer available

Graduate-level dataset. Focus on advanced academic content.

Xerv-AI (2025)

Download GRAD Dataset

MIT