Karata area, Akhvakh district

What is TALD?

The Typological Atlas of the Languages of Daghestan (TALD) is a tool for the visualization of information about linguistic structures typical of Daghestan. The scope of the project currently covers all East Caucasian languages and several other languages spoken in Daghestan, Chechnya, Ingushetia and adjacent territories.

The Atlas consists of:

Daghestan as a linguistic area

Daghestan is the most linguistically diverse part of the Caucasus, with at least 40 different languages (and many more highly divergent idioms) spoken on a territory of 50,300 km2 that consists mostly of mountainous terrain. The majority of the languages spoken here belong to the East Caucasian (or Nakh-Daghestanian) language family: one of the three language families indigenous to the Caucasus. For the most part, the languages of the East Caucasian family are spoken only in the eastern Caucasus area (with the exception of some relatively recent diasporic communities). They have no proven genealogical relationship to any other languages or language families.

Other languages spoken in Daghestan include three Turkic languages: Nogai, Kumyk (Kipchak) and Azerbaijani (Oghuz); and three Indo-European languages: Russian (Slavic, the major language of administration, education, and urban areas), Armenian (Armenic), and Tat (Iranian). Arabic is the language of religion, as most people in Daghestan are Sunni Muslims. The official languages of Daghestan (in alphabetical order) are Aghul, Avar, Azerbaijani, Chechen, Dargwa, Kumyk, Lezgian, Lak, Nogai, Russian, Rutul, Tabasaran, Tat, Tsakhur.

Historically there was no single lingua franca for the whole area. As a result, Daghestanians were known for having a command of multiple locally important languages, which they picked up in the course of seasonal labor migration, trading at cardinal markets, and other types of contact. Currently these patterns are disappearing fast due to the expansion of Russian.

One of the aims of TALD is to chart the genealogical and geographical distribution of linguistic features and to facilitate multi-faceted analyses of language contact in Daghestan by comparing the presence of shared features with known patterns of bilingualism and lexical convergence.

Map visualizations

The Atlas currently offers four different types of map visualizations:

  1. Extrapolated data
  2. Data granularity
  3. General datapoints
  4. General datapoints (feature only)

Each of these visualizations has its benefits and drawbacks, so we allow the user to toggle between the different options.

Below are some examples from the chapter on Morning greetings, which describes the two main ways to greet someone in the morning in the languages of Daghestan: wishing them a good morning or asking them whether they woke up.

1. Extrapolated data

This is our default visualization. It represents each language as a cluster of dots, which correspond to villages where a certain language is spoken.1 The inside of each dot is colored by language; languages from the same group have similar colors (e.g. all Lezgic languages are some shade of green). Hover over a dot to see the name of the language, and click to view a popup with a link to the language’s page in the Glottolog database and the name of the village. The color of the outer dots indicates the value of a linguistic feature.

A benefit of this type of visualization, is that it shows the size and boundaries of speech communities (as opposed to maps based on abstract general datapoints). Its main drawback is that it involves a lot of generalization. We do not have information on each village variety of the languages in our sample, so we extrapolate the information we have on a language or dialect to all the villages where they are spoken. In doing so, we risk overgeneralizing information and erasing possible dialectal differences.

2. Data granularity

The data granularity visualization shows the level of accuracy for each datapoint in the previous visualization, e.g. “village dialect” indicates that we had information about the feature for this specific village variety, while “language” means that we only had information for the language in general, from which we extrapolated information for this point. This allows the user to see what kind of data underlies the default visualization.

Our goal for the Atlas is to continue adding new data to existing datasets and thus gradually improve its coverage and accuracy.

3. General datapoints

Because the village-based maps can be visually overwhelming, the Atlas also provides a more basic visualization that shows one dot on the map for each language in the sample.

4. General datapoints (feature only)

This visualization is similar to the previous one but shows only the distribution of the feature values, without the distraction of genealogical information.

Contribute to the Atlas

The chapters and datasets in the Atlas are created by researchers specializing in the languages of Daghestan as well as by students of linguistics with no prior knowledge of the area and the languages spoken there.

If you would like to contribute a chapter and / or data to the Atlas because you are studying a certain topic in the languages of Daghestan, or you are a student looking for an internship, do not hesitate to contact us! You can find our contact info under Team.

To get a better idea of our methodology and what you will have to do if you decide to become a contributor, see our Contributor Manual.

How to cite

Text

Daniel, Michael, Konstantin Filatov, George Moroz, Timofey Mukhin, Chiara Naccarato and Samira Verhees. 2020. Typological Atlas of the languages of Daghestan (TALD). Moscow: Linguistic Convergence Laboratory, HSE University. URL: http://lingconlab.ru/dagatlas. Accessed on

BibTeX

@book{tald2020,
  address   = {Moscow},
  author    = {Daniel, Michael and Filatov, Konstantin and Moroz, George and Mukhin, Timofey and Naccarato, Chiara and Verhees, Samira},
  publisher = {Linguistic Convergence Laboratory, HSE University},
  title     = {Typological Atlas of the languages of Daghestan (TALD)},
  url       = {http://lingconlab.ru/dagatlas},
  year      = {2020}
}

  1. This visualization makes use of the East Caucasian villages dataset.

2020, Linguistic Convergence Laboratory, HSE University