Code-Switching Patterns in Multilingual Social Media: A Corpus Study

Code-Switching Patterns in Multilingual Social Media: A Corpus Study

Elena Garcia*

Research Article2024DOIOpen AccessPeer Reviewed

Abstract

We analyze a 4.2-million-token corpus of multilingual Twitter data to characterize intra-sentential code-switching across English-Spanish, English-Arabic, and English-Mandarin pairs. Matrix language frame analysis reveals systematic asymmetries in embedding language selection correlated with topic domain and audience design. Political and sports topics show the highest switching rates (38% and 31% of tokens respectively).

Publication Information

AcceptedJanuary 5, 2024

Author Information

Elena Garcia Corresponding Author
Affiliation: Universitat Autonoma de Barcelona, Department of Linguistics
Keywordscode-switching, multilingualism, corpus linguistics, social media, matrix language frame

Additional Information

Views2