Transformer Architecture Scaling Laws for Scientific Text Classification
Journal of Artificial Intelligence Research • Vol. 13, No. 1
Abstract
We investigate scaling laws for transformer-based language models applied to scientific literature classification. Our experiments across 12 benchmark datasets demonstrate that model performance scales predictably with parameter count and training data volume, with domain-specific pre-training yielding 18% improvements over general corpora. Optimal compute allocation favors data scaling over parameter scaling beyond 1B parameters.