Dolma3 Unique Document Corpus Explorer

1.1B deduplicated documents, 2.1T tokens from the OLMo3 training pool (allenai/dolma3_pool). Documents were deduplicated at the document level, classified with WebOrganizer (24 topics, 24 formats), and scored with the allenai/dolma3-fasttext-quality-classifier. The quality score measures text coherence (well-formedness), not content value.