The GeLaTo dataset is a worldwide diversity panel of available population genetic samples matched with databases of linguistic, cultural and environmental diversity. Population genetic samples are assigned to existing GlottoCodes, following ethnolinguistic criteria: the data is filtered following the indication of geneticists, linguists, cultural anthropologists and historians. The choice of genetic data corresponds to essential guidelines: maximum compatibility and standardization, modern high quality data, avoidance of ascertainment bias, availability for different regions of the world, and finally high resolution to capture recent events. The dataset provides elaborated summary statistics such as genetic diversity within a population, genetic proximity between pairs of populations, sharing of identical motifs, and demographic history reconstructions. The genetic samples are directly linked to Glottolog and D-Place databases, and to the original publication. The current version hosts summary statistics from the genetic diversity panel of autosomal STR from Pemberton et al. 2013 It will be expanded to include mtDNA genomes, Y chromosome STRs, and autosomal SNPs.
If you use this data, please cite
Barbieri et al. 2022. A global analysis of matches and mismatches between human genetic and linguistic histories. PNAS. DOI: 10.1073/pnas.2122084119
as well as the released version of the dataset.
Icons made by Freepik from www.flaticon.com are licensed under CC 3.0 BY