XERV: AI Training Datasets

Bottom Line Up Front: XERV provides a library of high-quality, synthetic, and anonymized datasets (PURE, TART, GRAD) engineered to enhance chain-of-thought reasoning and academic performance in foundational AI models.

Dataset Name Size / Format Description Access

PURE

PURE

Reasoning Chain-of-Thought Logic Mathematics Code

2.7M rows

Viewer available
Pretraining Universal Reasoning Engine. A massive, hyper-filtered, and structurally unified corpus designed to ignite chain-of-thought (CoT) reasoning capabilities in foundational large language models.
Xerv-AI (2026)
Download PURE Dataset

other

TART

TART

Reasoning Chain-of-Thought Distillation Instruction-Tuning SFT

344k rows

Viewer available
Textual Answers & Reasoning Traces. A hyper-robust, massively aggregated, and meticulously filtered corpus of instruction-tuning records engineered for Supervised Fine-Tuning (SFT) and the distillation of advanced Chain-of-Thought (CoT) reasoning capabilities.
Xerv-AI (2026)
Download TART Dataset

apache-2.0, mit

GRAD

GRAD

Academic Graduate STEM

1.93k samples

Viewer available
Graduate-level dataset. Focus on advanced academic content.
Xerv-AI (2025)
Download GRAD Dataset

MIT