Skip to main navigation Skip to search Skip to main content

A curated global dataset of social contact between diverse language communities

  • Eri Kashima*
  • , Francesca Di Garbo
  • , Oona Raatikainen
  • , Robert Forkel
  • , Rosnátaly Avelino
  • , Sacha Beck
  • , Anna Berge
  • , Ana Blanco Pena
  • , Ross Bowden
  • , Nicolás Brid
  • , Joseph M. Brincat
  • , María Belén Carpio
  • , Alexander Cobbinah
  • , Paola Cúneo
  • , Wotango Doyiso Deginet Wotango Doyiso
  • , Anne Maria Fehn
  • , Saloumeh Gholami
  • , Arun Ghosh
  • , Hannah Gibson
  • , Elizabeth Hall
  • Katja Hannß, Hannah Haynie, Jerry J. Jacka, Mathias Jenny, Richard Kowalik, Sonal Kulkarni-Joshi, Maarten Mous, Marcela Mendoza, Cristina Messineo, Francesca Romana Moro, Hank Nater, Michelle Ocasio, Bruno Olsson, Ana María Ospina Bozzi, Agustina Paredes, Admire Phiri, Nicolas Quint, Erika Sandman, Dineke Schokkin, Ruth Singer, Ellen Smith-Dennis, Lameen Souag, Yunus Sulistyono, Yvonne Treis, Matthias Urban, Jill Vaughan, Georg Ziegelmeyer, Veronika Zikmundová, Ricardo Napoleão de Souza, Kaius Sinnemäki*
*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

Abstract

The GramAdapt Social Contact Dataset is a curated dataset of 34 language pairs with qualitative and quantifiable data on social interaction and aspects of societal multilingualism. The language pairs were sampled globally to represent the world’s linguistic diversity. The dataset can be used to interrogate the social dimensions of language contact independently or in conjunction with appropriate linguistic data. The data were collected by distributing a questionnaire to experts who have experience with either one or both of the language communities of a pair. The data represent subjective expert assessments based on choices from predetermined answers which can be quantified. Authors 1, 2 and 3 manually checked the response to identify possible misjudgments or misunderstandings. This results in a dataset containing 13,493 data points. This dataset is a first of its kind in the field of linguistics, built upon wide findings from sociolinguistics, historical linguistics, psycholinguistics, and linguistic anthropology.

Original languageEnglish
Article number1958
Peer-reviewed scientific journalScientific Data
Volume12
Issue number1
ISSN2052-4463
DOIs
Publication statusPublished - 12.2025
MoE publication typeA1 Journal article - refereed

Keywords

  • 612,1 Languages

Fingerprint

Dive into the research topics of 'A curated global dataset of social contact between diverse language communities'. Together they form a unique fingerprint.

Cite this