Transformer Architecture Scaling Laws for Scientific Text Classification

Transformer Architecture Scaling Laws for Scientific Text Classification

Wei Chen*, Jiyeon Kim

Research Article2024DOIOpen AccessPeer Reviewed

Abstract

We investigate scaling laws for transformer-based language models applied to scientific literature classification. Our experiments across 12 benchmark datasets demonstrate that model performance scales predictably with parameter count and training data volume, with domain-specific pre-training yielding 18% improvements over general corpora. Optimal compute allocation favors data scaling over parameter scaling beyond 1B parameters.

Publication Information

AcceptedJanuary 10, 2024

Author Information

Wei Chen Corresponding Author
Affiliation: MIT Laboratory for Artificial Intelligence
Affiliation: KAIST School of Computing, South Korea
Keywordstransformer, scaling laws, text classification, NLP, scientific literature

Additional Information

Views8