XERV: AI Training Datasets
Bottom Line Up Front: XERV provides a library of high-quality, synthetic, and anonymized datasets (PURE, TART, GRAD) engineered to enhance chain-of-thought reasoning and academic performance in foundational AI models.
| Dataset Name | Size / Format | Description | Access |
|---|---|---|---|
|
PURE PURE
Reasoning
Chain-of-Thought
Logic
Mathematics
Code
|
2.7M rows Viewer available |
Pretraining Universal Reasoning Engine. A massive, hyper-filtered, and structurally unified corpus designed to ignite chain-of-thought (CoT) reasoning capabilities in foundational large language models.
Xerv-AI (2026)
|
Download PURE Dataset
other |
|
TART TART
Reasoning
Chain-of-Thought
Distillation
Instruction-Tuning
SFT
|
344k rows Viewer available |
Textual Answers & Reasoning Traces. A hyper-robust, massively aggregated, and meticulously filtered corpus of instruction-tuning records engineered for Supervised Fine-Tuning (SFT) and the distillation of advanced Chain-of-Thought (CoT) reasoning capabilities.
Xerv-AI (2026)
|
Download TART Dataset
apache-2.0, mit |
|
GRAD GRAD
Academic
Graduate
STEM
|
1.93k samples Viewer available |
Graduate-level dataset. Focus on advanced academic content.
Xerv-AI (2025)
|
Download GRAD Dataset
MIT |