A corpus of Slavic dialects in Albania


Welcome to the start page of the Corpus of Slavic dialects in Albania.

Details Search

A corpus of Slavic dialects in Albania

“Jazikot eje kaj čovekot. Eje zdraf, ako ese ljuditi zdravi. Eje živ, ako este i ljuditi živi. Umbri čoveko, umbri i jaziko. Mjene mi grjej žl’e ščo jaziko naš umbre. Toko umbreje i ljuditi, i koj ža zborvi? Nema koj da zborvi...”

“Languages are like humans. They are healthy if the people are healthy. They are alive if the people are alive. If a person dies, a language dies. I feel sorry that our language died, but the people died as well, and who is going to speak [the language]? Nobody is…”

These are the main parameters of the corpus:

DialectRural locationsUrban locationsSize, thousand wordsExcluding the utterances of the researchersMorphological analysis
Korça MacedonianBoboshticaKorça34.0Classla 2.1.1 for Macedonian, partial manual postprocessing
Prespa MacedonianPustec, Gorna Gorica, Dolna Gorica, ShulinElbasan, Korça171.3Classla 2.1.1 for Macedonian, partial manual postprocessing
Golloborda MacedonianTrebisht, Vërnica, MalestreniDurrës, Elbasan, Tirana239.7Classla 2.1.1 for Macedonian, partial manual postprocessing
Myzeqe ŠtokavianRreth Libofsha, PetovaFier58.8Classla 2.1.1 for Serbo-Croatian, partial manual postprocessing
Shijak ŠtokavianBorake, KoxhasShijak, Sukth68.8Classla 2.1.1 for Serbo-Croatian, partial manual postprocessing
AlbanianAll of the aboveAll of the above34.7uniparser-albanian
Other languages: Bulgarian, English, French, German, Greek, BCMS, Italian, Russian, Turkish4.5not analyzed
Total611.8

Previous research on Slavic dialects in Albania

This project is not the first study of Slavic dialects in Albania (SDAs), but we did consider varieties that have never been studied (e.g., Slavic speech in urban Albanian settings; Myzeqe Štokavian).

Starting from the groundbreaking monograph Slavic populations of Albania by A. Seliščev (1931), there have been several publications on SDAs that covered both the history of the Slavic population in this country (e.g., Ylli’s (1997, 2000) monograph on Slavic borrowings in Albanian toponymics) and its current state (Bojović 1991; Tončeva 2014; Vidoeski 1998). Of the utmost importance is the four-volume work “Die slavischen Minderheiten in Albanien” by Steinke and Ylli (2007, 2008, 2010, 2013), supported by a Deutsche Forschungsgemeinschaft (DFG) grant between 2002 and 2011.

In the selected publications listed, as well as in those dedicated to separate dialects (see the outline of dialects below), you can find descriptions of the language systems of the SDAs, information about the current status of the communities that use the SDAs, and dialectal transcripts.

Selected language varieties

The labels for the language varieties included in the corpus do not make any claims about the national, ethnic, or other identities of the speakers; they are purely provided for an orientation in terms of the respective dialectologies. The labels are not necessarily those that the speakers used, either. In fact, some speakers do not use any labels at all, while others use a variety of labels, often non-terminologically. The language issue within some of the ethnolinguistic minorities in Albania is seriously politicized; however, this corpus does not carry any political claims of any political organization, party, group of individuals, or state, etc. The language labels used in the external sources quoted here are as in the original for identification purposes only.

Five dialects were chosen for this project, as shown on the Google Map.

Golloborda Macedonian

Golloborda Macedonian is a peripheral Balkan Slavic dialect that continues West Macedonian Debar dialects in Albanian territory. It is spoken in 15 villages in the Albanian regions of Dibra and Elbasan, as well as in migrant communities in the cities of Durrës, Tirana, and Elbasan. It has been estimated that it has more than 7,000 speakers in total. In its rural centers, this community has been studied thoroughly by a team of researchers from the Institute of Linguistic Studies (Russian Academy of Sciences), Saint Petersburg State University, and Peter the Great Museum of Anthropology and Ethnography (Kunstkammer); their research resulted in a valuable monograph that was translated into Albanian and Macedonian. Notably, our corpus-based research focused on the sociolinguistic variation and changes within this dialect, specifically between its rural and urban centers, so our methods and data differ from those in the research of our peers from Saint Petersburg.

Selected literature: Steinke & Ylli (2008); Sobolev & Novik (2013, 2017, 2018).

Korça Macedonian

Korça Macedonian is an apparently extinct Balkan Slavic island dialect (structurally close to the dialectical area of Southeastern Macedonian). The corpus includes speech samples from the last six speakers (three of whom lived in the village of Boboshtica and three of whom lived in the town of Korça in Southeastern Albania but were originally from Drenova). Despite the community being so small, this dialect was crucial for the project, as it has been subjected to the most prolonged and intensive Albanian influence. However, family discussions could not be organized in this community.

Selected literature: Mazon (1936); Mazon and Filipova-Bajrova (1965); Steinke and Ylli (2007).

Prespa Macedonian

Prespa Macedonian is a peripheral Balkan Slavic dialect that continues the West Macedonian Ohrid-Prespa dialects on the Albanian side of Great Prespa Lake, transitional to Southeastern Macedonian. According to estimates, it has around 4,500 speakers in nine villages of the region and two large towns, namely Korça and Bilisht.

Selected literature: Steinke and Ylli (2007); Cvetanovski (2010).

Myzeqe Štokavian

Spoken in several quarters in Fier and several villages around this town, Myzeqe Štokavianan is a Štokavian island dialect spoken among recent (1920s) migrants from the Sandžak region (a Novi Pazar-Sjenica dialect of the Zeta-Sjenica dialectal zone) of what is now Southwestern Serbia and the bordering region of Montenegro.

Selected literature: Makartsev and Kikilo (2022); Makartsev (2023).

Shijak Štokavian

Spoken in the village of Borake and its satellite village of Koxhas, Shijak Štokavian is a Štokavian island dialect spoken among relatively recent (from the 1880s) migrants from the Mostar region in what is now Bosnia and Herzegovina (a central Herzegovinian subdialect of the East Bosnian dialectal zone, spoken in the Mostar—Čapljina—Stolac triangle). It is spoken by 150 to 220 families in both villages. Speakers of this dialect also live in the town of Shijak and Sukth.

Selected literature: Steinke and Ylli (2013); Makartsev and Kikilo (2022); Makartsev (2023)

Sociolinguistic diversity among the SDAs

These dialects have varying degrees of structural affinity with Albanian due to their differing connections to the Balkan sprachbund. Structurally, the closest to Albanian are the Balkan Slavic dialects of Korça, Golloborda, and Prespa. The Štokavian dialects of Myzeqe and Shijak are not included in the Balkan sprachbund and show less structural affinity with Albanian.

The selected dialects do not represent all the SDAs. One of the most complete lists of SDAs can be found in Steinke and Ylli’s monograph. However, they contain the variety of elements and parameters that have caused the diversity in the SDAs.

Two of the dialects are Štokavian (Myzeqe and Shijak), but their speakers have differing ethnopolitical and linguistic orientations: Our interviewees in Shijak usually articulated their Bosniak identity; Myzeqe speakers usually clarified their Bosniak or Serbian identity. Three dialects are Balkan Slavic (Golloborda, Korça, Prespa). The orientation of the speakers of these dialects toward a standard language (Macedonian or Bulgarian) is usually individual. The ethnopolitical and linguistic orientations mentioned by the speakers were thus not interpreted as making any political claims, but they allowed us to thematize the orientation of the speakers toward one of the standard Southern Slavic languages and better explain certain features in their speech.

Four of the communities have a rural (more conservative) and an urban (less conservative) center (except Korça Macedonian, whose number of speakers did not allow us to construct this opposition). For Shijak, the labor activity of the speakers was important, especially in terms of whether their jobs were connected to work at the national road services. (Such workers have daily contact with the language of TIR truck drivers who speak BCMS.) The opposition of rural versus urban was not relevant for the Shijak data because of the short distance between the settlements and the small size of the towns of Sukth and Shijak. The city of Durrës, which is where many of the dialectal speakers work daily, is also located too close to Shijak to allow for the shaping of a significant urban colony with distinct features.

Religion was another factor that we considered might influence linguistic identity choices (cf. the linguo-confessional situation in Bosnia and Herzegovina and regions of Montenegro and Serbia populated by a Štokavian-speaking but traditionally Muslim population) since it is relevant to numerous distinctive features of the traditional culture. All Myzeqe and Shijak Štokavian speakers that we interviewed culturally and traditionally belong to Sunni Islam. Among the Balkan Slavic communities, all our Korça and Prespa speakers culturally belong to Orthodox Christianity, while Golloborda is heterogenous with the domination of Sunni Islam.

Topics of our interviews

TopicTypeBibliographical reference
1. Narratives and memoratesunstructured
2. Ethnographical and ethnolinguistic interviews:
- 2.1. Calendar, rites of passage (birth, marriage, death), demonology semi-structured (Plotnikova 2009, see the online publication)
- 2.2. Rites and beliefs connected to the moon semi-structured (Čëxa 2009)
- 2.3. Rites and beliefs connected to the cuckoo semi-structured (Makartsev 2017)
3. Frog where are you? (Mayer 1969, see preview; Berman et al. 1994–2004)
- 3.1. Conducted by the researchers graphic
- 3.2. Conducted by trained local assistants graphic
4. Family talks unstructured (Hentschel and Zeller 2013)

The narratives and memorates (T. 1) were unstructured discussions about the oral history and current problems of a given community that also provided insights into the identity and politics of memory of the community. The researchers led these discussions.

The ethnographical and ethnolinguistic interviews (T. 2) were conducted to collect ethnographical and ethnolinguistic information. They comprised the informants’ answers to our questions and covered various aspects of the traditional culture. We mainly followed the structure of the questionnaires (or interview designs) listed in the table, with slight adaptations.

Frog, Where Are You? (T. 3) is a book with 24 pictures that combine to form a visual narrative. This section was structured as a questionnaire, ensuring that the researcher was minimally involved. We also asked our trained local assistants to record themselves or their relatives and friends answering this questionnaire; therefore, the data that we collected here resembled real-life language use.

Our trained local assistants organized the family talks (T. 4) in our absence. The aim was to record spontaneous speech, so the topics were irrelevant. Since the same assistants prepared the transcripts, they could omit any sections that contained potentially harmful information or could have been used to identify the speakers.

Collecting this type of data was most successful for Golloborda Macedonian speakers since we had a network of trained local assistants upon whom we could rely.

We managed to arrange a few family talks among Prespa Macedonian speakers and just one family talk with Shijak Štokavian speakers.

For Myzeqe Štokavian, arranging family talks has not yet been successful.

For Korça Macedonian, such discussions were impossible since none of our speakers still used the dialect daily, although they could still speak it with us. The epigraph above was spoken by one of the speakers from Drenova (Dre01), wherein he described the frustration he felt while witnessing the attrition and loss of his native dialect.

Speakers

The speakers in the transcripts were divided into the three main categories:

1) Native speakers of the respective SDAs. They were anonymized. All information that could be used for their identification was manually removed from the corpus (tagged as ((ERASED))). We also changed their voices in order to make them unrecognizable. All speakers of this category were referenced with indices comprising three letters (for the settlement) and two digits. We also referred to them by these indices in our publications based on the corpus.

2) Researchers. They were only referenced with letter indices. Their names are provided in the Acknowledgments section.

SPK. This abbreviation was used to refer to all other speakers whose speech was transcribed for context but was not annotated for various reasons (an unknown neighbor passing by the window and saying hello, an Albanian-speaking waiter in a village café, some unidentified background voices, etc.).

See list of speakers

Index Gender Status Year of birth Origin Residence Dialect Education Other languages spoken Family members Occupation Comment Classes graduated
AC f researcher 1990 Shahty Moscow Ph.D. Macedonian, Albanian researcher
AL m researcher 1988 Moscow Moscow Ph.D. Bulgarian, Turkish, English researcher
AV f researcher 1986 Hyvinkää Oulu MA Croatian, English researcher
Bob01 f native 1936 Boboshtica Boboshtica Boboshtica professional Albanian teacher of Albanian
Bob02 m native 1925 Boboshtica Boboshtica Boboshtica higher Albanian engineer higher education in Charles University in Prague
Bob03 m native 1930 Boboshtica Boboshtica Boboshtica professional Albanian teacher
Bor01 f native 1999 Borake Borake Borake higher (unfinished) Albanian bride of Bor02 student
Bor02 f native 1990 Borake Borake Borake middle Albanian brother of Bor11, son of Bor12 port worker
Bor04 m native 1935 Borake Borake Borake middle school (unfinished) Albanian port worker Classes graduated: 3 grades elementary, 1 grade school for tractors 4 grades
Bor06 f native 1926 Borake Borake Borake middle school (unfinished) Albanian grandmother of Bor19 farmer 3 grades
Bor07 f native 1935 Borake Borake Borake middle school (unfinished) Albanian father's sister-in-law of Bor19 farmer 3 grades
Bor08 m native 1984 Borake Zagreb Croatian higher Albanian IT
Bor10 m native 1947 Borake Borake Borake professional Albanian father of Bor09, Bor19 and Bor23, brother of Bor12 teacher of maths and Bosnian language at school
Bor11 m native 1979 Borake Borake Borake middle Albanian brother of Bor02, son of Bor12 worker
Bor12 m native 1950 Borake Borake Borake middle Albanian father of Bor11 and Bor02, husband of Bor25 worker
Bor15 m native 1957 Borake Sukth Borake middle Albanian father of Bor24 musician
Bor16 m native 1949 Borake Sukth Borake middle Albanian father of Bor18, husband of Bor17 worker, builder mother born in Potkosa, Domanovići
Bor17 f native 1949 Borake Sukth Borake middle Albanian mother of Bor18, wife of Bor16 farmer
Bor18 m native 1969 Borake Sukth Borake middle Albanian son of Bor16 and Bor17 port worker
Bor19 f native 1987 Borake Zagreb Borake higher Albanian teacher of English, nurse
Bor20 m native 1920 Borake Borake Borake middle school (unfinished) Albanian farmer 3 grades
Bor21 m native 1945 Borake Borake Borake middle school (unfinished) Albanian son of Bor22 farmer 7 grades
Bor22 m native 1920 Borake Borake Borake middle school (unfinished) Albanian father of Bor21 farmer 3 grades
Bor23 f native 1977 Borake Borake Borake higher Albanian teacher of biology and chemistry at school
Bor24 m native 1983 Borake Borake Borake middle Albanian son of Bor15 musician
Bor25 f native 1950 Borake Borake Borake middle school (unfinished) Albanian wife of Bor12 farmer 3 grades
Bor26 f native 1950 Borake Borake Borake middle school (unfinished) Albanian daughter of Bor04 farmer 3 grades
Bor27 f native 1940 Sukth Borake Borake middle Albanian wife of Bor04 farmer
Bor28 m native 1947 Borake Sukth Borake middle Albanian car master, hotel owner
Bor29 m native 1980 Borake Sukth Borake middle Albanian son of Bor28 waiter
Bor30 f native 1950 Sukth Sukth Borake middle Albanian wife of Bor28? cook she is Albanian, but has learnt the Borake dialect "in three months" after she got married
Bor32 f native 1938 Borake Borake Borake middle school (unfinished) Albanian farmer 7 grades
Bor33 f native 1972 Borake Durres Borake middle school (unfinished) Albanian trader 8 grades
Dre01 m native 1934 Drenova Korça Boboshtica higher Albanian brother of Dre02 and Dre03 teacher of maths
Dre02 f native 1927 Drenova Korça Boboshtica middle Albanian sister of Dre01 and Dre03 housewife
Dre03 m native 1922 Drenova Korça Boboshtica middle Albanian brother of Dre01 and Dre02 farmer
EU f researcher 1968 Moscow Moscow Ph.D. Bulgarian researcher
Elb01 m native 1995 Elbasan Elbasan Golloborda middle Albanian family from Stebleva
Elb02 m native 1979 Cerrik Cerrik Albanian middle Greek husband of Tre30, father of Tre31, Tre32, Tre29 builder
Erb01 f native 1975 Erbele Tirana Golloborda higher Albanian housewife
Tre80 m native Elbasan Albanian Only speaks Albanian
Gji01 f native 1970 Gjinovec Elbasan Golloborda middle Albanian
Gll01 m native 1943 Glloboçani Glloboçani Prespa middle Albanian farmer, worker
Gll02 m native 1958 Glloboçani Glloboçani Prespa professional Albanian administrator
Gor01 f native 1946 Shulin Gorna Gorica Prespa middle Albanian farmer
Gor02 m native 1952 Gorna Gorica Gorna Gorica Prespa middle Albanian farmer
Gor03 m native 1995 Gorna Gorica Gorna Gorica Prespa middle Albanian beekeeper
Gor05 f native 1955 Gorna Gorica Elbasan Prespa higher Albanian nurse left Prespa in 1973
Gor06 m native 1970 Gorna Gorica Gorna Gorica Prespa middle Albanian farmer
Gor07 m native 1948 Gorna Gorica Gorna Gorica Prespa middle school (unfinished) Albanian church administrator 7 grades
Gor08 m native 1970 Gorna Gorica Gorna Gorica Prespa middle Albanian political administrator
Gor09 m native 1960 Gorna Gorica Gorna Gorica Prespa middle Albanian businessman
Gor10 f native 1952 Gorna Gorica Gorna Gorica Prespa middle Albanian
Gor11 m native 1966 Gorna Gorica Gorna Gorica Prespa middle Albanian beekeeper
Gor12 f native 1966 Gorna Gorica Gorna Gorica Prespa middle Albanian father of Gor03, husband of Gor12 beekeeper
Gor13 m native 1935 Gorna Gorica Gorna Gorica Prespa middle school (unfinished) Albanian farmer
Gor14 m native 1965 Gorna Gorica Korça Prespa middle school (unfinished) Albanian 8 grades
Ham01 m native 1970 Hamil Fier Fier higher Albanian cousin of Rre04 lawyer mother from Bjelo Pole
Ham02 m native 1965 Hamil Hamil Fier middle Albanian son of Pet01 businessman
Kle01 f native 1982 Klenje Elbasan Golloborda middle Albanian
Kle02 m native 1977 Klenje Tirana Golloborda higher Albanian moved to Tirana in 1997
Kle04 m native 1983 Klenje Elbasan Golloborda middle Albanian hoxha moved to Elbasan in 2004
Kor01 m native 1972 Korça Korça Prespa (Tuminec) higher Albanian teacher lives in Korca from 1990
Kor02 m native 1969 Korça Pustec Prespa higher Albanian teacher
Kor03 m native 1969 Korça Pustec Prespa higher Albanian teacher
Lesh02 m native 1957 Leshnicani Elbasan Golloborda professional Albanian worker moved to Elbasan in 1963
Lesh03 m native 1932 Leshnicani Elbasan Golloborda middle Albanian farmer left Trebisht in 1944, moved to Durrës
Lesh04 m native 1963 Leshnicani Elbasan Golloborda middle Albanian worker
Lesh05 m native 1967 Leshnichani Tirana Golloborda middle Albanian businessman, worker
Lesh06 m native 1969 Leshnichani Tirana Golloborda middle Albanian worker
Lesh07 f native 1967 Leshnichani Fushë Kruja Golloborda middle Albanian daughter of Lesh05
Lesh08 m native 1952 Elbasan Elbasan Golloborda middle Albanian son of Lesh03
Lesh09 f native 1952 Elbasan Elbasan Albanian middle Albanian wife of Lesh09 Albanian speaker, origin from Trebishta
Lesh10 f native 1972 Elbasan Elbasan Albanian middle Albanian daughter of Lesh09 and Lesh10
Lla01 m native 1948 Lladimerica Elbasan Golloborda middle Albanian
Lla02 f native 1938 Lladimerica Tirana Golloborda middle school (unfinished) Albanian 4 grades
MC m researcher 1991 Moscow Moscow Ph.D. Macedonian, English researcher
MM m researcher 1984 Moscow Oldenburg Ph.D. Albanian, Macedonian, BCMS, English researcher
MMI f researcher Sofia Calgary higher Albanian, English researcher
Mal02 m native 1955 Malestreni Tirana Golloborda middle Albanian military
Mal04 m native 1984 Malestreni Tirana Golloborda higher Albanian researcher
NK f researcher 1992 Železnogorsk Moscow Ph.D. Macedonian, BCMS researcher
NM f researcher 1995 Altdorf Zürich high school English, Macedonian researcher
Non01 all non-transcribed speakers
Ost01 f native 1975 Ostren Elbasan Golloborda middle Albanian
Ost02 f native 1974 Ostreni Elbasan Golloborda middle Albanian
Pet01 f native 1940 Petovë Hamil Fier middle school (unfinished) Albanian aunt of Rre04 farmer 4 grades
Pus01 f native 1995 Pustec Pustec Prespa higher Albanian vnu'če od zo'lva of Pus09, daughter-in-law of Pus23 teacher of sports at school
Pus03 f native 1969 Pustec Korça Prespa middle Albanian trader
Pus04 m native 1953 Pustec Pustec Prespa middle Albanian political administrator, journalist, writer
Pus05 f native 1965 Pustec Pustec Prespa middle Albanian farmer
Pus06 m native 1983 Pustec Pustec Prespa higher Albanian political administrator, historian, higher education in Skopje
Pus08 f native 1960 Pustec Pustec Prespa middle Albanian farmer
Pus09 f native 1960 Pustec Pustec Prespa middle Albanian posestrima of Pus24 farmer
Pus11 m native 1960 Pustec Pustec Prespa middle Albanian farmer
Pus12 m native 1970 Pustec Pustec Prespa middle Albanian teacher
Pus13 f native 1960 Pustec Pustec Prespa middle Albanian
Pus14 m native 1935 Pustec Pustec Prespa middle Albanian
Pus15 m native 1960 Pustec Pustec Prespa middle Albanian farmer
Pus16 m native 1978 Pustec Korça Prespa higher Albanian teacher of MK graduated in Skopje
Pus17 f native 1990 Korça Pustec Prespa middle Albanian wife of Pus18 farmer
Pus18 m native 1990 Pustec Pustec Prespa middle Albanian husband of Pus17
Pus19 m native 1970 Pustec Pustec Prespa middle Albanian father of Pus18? farmer
Pus20 f native 1950 Pustec Pustec Prespa middle Albanian farmer
Pus21 f native 1950 Pustec Pustec Prespa middle Albanian farmer
Pus23 f native 1943 Gorna Gorica Pustec Prespa middle school (unfinished) Albanian mother-in-law of Pus01 farmer 3 grades
Pus24 f native 1973 Pustec Pustec Prespa middle Albanian posestrima of Pus09
Rre01 m native 1968 Rreth Libofsha Rreth Libofsha Fier middle Albanian son of Rre02 and Rre03 businessman
Rre02 m native 1942 Rreth Libofsha Rreth Libofsha Fier middle Albanian husband of Rre04, father of Rre01 farmer father from Bujca
Rre03 f native 1945 Rreth Libofsha Rreth Libofsha Fier middle Albanian wife of Rre02, mother of Rre01 farmer mother from Jaˋblanica, father from Donane
Rre04 m native 1957 Rreth Libofsha Rreth Libofsha Fier middle Albanian general worker
Rre06 m native 1951 Rreth Libofsha Fier Fier middle Albanian businessman
Pus22 m native 1980 Pustec Pustec Prespa higher Albanian political administrator
Ste01 m native 1975 Stebleva Elbasan Golloborda middle Albanian
Ste02 f native 1942 Stebleva Elbasan Golloborda middle Albanian
Ste03 m native 1971 Stebleva Elbasan Golloborda middle Albanian
Ste04 m native 1968 Stebleva Elbasan Golloborda middle Albanian
Ste05 f native 1972 Stebleva Elbasan Golloborda middle Albanian
Ste07 m native 1962 Stebleva Elbasan Golloborda middle Albanian
TG f researcher 1983 Magadan Moscow Ph.D. Macedonian, BCMS researcher
Tir01 m native 1960 Tirana Tirana Golloborda middle Albanian trader
Tir02 f native 2009 Fushë Kruja Fushë Kruja Golloborda none Albanian daughter of Lesh07 preschool student
Tre01 f native 1946 Trebisht Trebisht Golloborda middle Albanian mother of Tre20, Tre27, Tre30 farmer
Tre02 m native 1964 Bulqiza Tirana Golloborda professional Albanian military family from Trebishta, born in Bulqiza
Tre03 f native 1956 Trebisht Trebisht Golloborda middle Albanian mother of Tre19 farmer
Tre04 m native 1936 Trebisht Trebisht Golloborda middle Albanian farmer
Tre05 f native 1936 Trebisht Trebisht Golloborda middle Albanian farmer
Tre07 m native 1994 Trebisht Trebisht Albanian middle Albanian farmer
Tre10 m native 2016 Trebisht Trebisht Golloborda middle Albanian son of Tre11 farmer
Tre11 f native 1991 Trebisht Trebisht Golloborda middle Albanian daughter of Tre17 farmer has been living in Croatia for the last 2-3 years
Tre12 f native 1944 Trebisht Trebisht Golloborda middle Albanian farmer
Tre13 f native 2003 Trebisht Trebisht Golloborda middle Albanian farmer
Tre14 m native 1968 Trebisht Trebisht Golloborda middle Albanian farmer
Tre15 f native 1970 Trebisht Elbasan Golloborda middle Albanian housewife
Tre17 f native 1963 Trebisht Trebisht Golloborda middle Albanian farmer
Tre18 f native 2005 Trebisht Trebisht Golloborda middle Albanian school student
Tre19 m native 1977 Trebisht Trebisht Golloborda none Albanian builder
Tre20 f native 1977 Trebisht Trebisht Golloborda middle Albanian sister of Tre27, Tre30, daughter of Tre01 housewife
Tre21 m native 2006 Trebisht Trebisht Golloborda middle school student Albanian school student
Tre22 f native 2003 Trebisht Trebisht Golloborda middle school student Albanian school student
Tre24 f native 2003 Trebisht Trebisht Golloborda middle school student Albanian school student
Tre25 m native 1996 Trebisht Trebisht Golloborda higher (unfinished) Albanian student
Tre27 m native 1970 Trebisht Trebisht Golloborda middle Albanian, Greek, BCMS brother of Tre20, Tre30, son of Tre01 general worker
Tre29 m native 2006 Trebisht Trebisht Golloborda middle school student Albanian son of Tre30 and Elb02, brother of Tre31 and Tre32 school student
Tre30 f native 1980 Trebisht Cerrik Golloborda middle Albanian sister of Tre27, Tre20, daughter of Tre01, wife of Elb02, mother of Tre31, Tre32, Tre29 housewife
Tre31 f native 2002 Trebisht Elbasan Golloborda higher (unfinished) Albanian daughter of Tre30 and Elb02, sister of Tre32 and Tre29 school student going to school in Ohrid
Tre32 f native 2002 Trebisht Trebisht Golloborda higher (unfinished) Albanian daughter of Tre30 and Elb02, sister of Tre31 and Tre29 school student going to school in Ohrid
Tre33 m native 1956 Trebisht Elbasan Golloborda middle Albanian
Tre34 f native 1959 Trebisht Elbasan Golloborda middle school (unfinished) Albanian 4 grades
Tre35 m native 1947 Trebisht Elbasan Golloborda middle Albanian
Tre37 f native 1948 Trebisht Elbasan Golloborda middle Albanian
Tre38 m native 1984 Trebisht Trebisht Golloborda middle Albanian works in Croatia and Germany
Tre39 m native 1955 Tirana Tirana Golloborda middle Albanian father from Vërnica, mother from Malestreni. Family moved to Tirana in 1954
Tre40 f native 1957 Përmet Tirana Golloborda middle Albanian parents are from Përmet
Tre41 m native 1930 Trebisht Tirana Golloborda higher Albanian
Tre42 m native 1960 Trebisht Tirana Golloborda middle Albanian
Tre43 m native 1991 Tirana Tirana Golloborda middle Albanian Born and raised in Tirana, now lives in Tirana, speaks AL at home
Tre44 m native 1996 Tirana Tirana Golloborda middle Albanian Born and raised in Tirana, now lives in Tirana, speaks AL at home
Tre45 m native 1966 Trebisht Elbasan Golloborda middle Albanian Born and raised in Trebisht, now lives in El, speaks AL at home
Tre46 m native 1990 Trebisht Trebisht Golloborda middle Albanian
Tre47 m native Trebisht Trebisht Golloborda Albanian
Tre48 f native 1989 Trebisht Trebisht Golloborda middle Albanian
Tre49 f native 2011 Trebisht Trebisht Golloborda middle school student Albanian
Tre50 f native 2014 Trebisht Trebisht Golloborda middle school student Albanian
Tre51 m native 2006 Trebisht Fushë Kruja Golloborda middle school student Albanian
Tre52 m native 1967 Cerrik Cerrik Golloborda middle Albanian husband of Tre53
Tre53 f native 1967 Trebisht Cerrik Golloborda middle Albanian wife of Tre52 moved in 1993 to Cerrik because of her marriage to Tre52
Tre54 f native 2001 Cerrik Cerrik Golloborda higher Albanian
Tre55 f native 1949 Elbasan Trebisht Golloborda middle school (unfinished) Albanian 7 grades
Tre56 f native Trebisht Tirana Golloborda Albanian
Tre57 m native 2015 Trebisht Tirana Golloborda middle school student Albanian
Tre59 f native 1999 Tirana Tirana Golloborda student Albanian
Tre60 m native 1949 Trebishta Tirana Golloborda middle school (unfinished) Albanian 7 grades
Tre61 f native 1952 Trebishta Tirana Golloborda middle school (unfinished) Albanian 7 grades
Tre62 f native 1971 Trebishta Elbasan Golloborda middle Albanian
Tre63 m native 1938 Trebishta Elbasan Golloborda middle Albanian
Tre64 m native Trebishta Elbasan Golloborda middle Albanian
Tre65 m native 1940 Trebishta Elbasan Golloborda middle school (unfinished) Albanian
Tre66 m native Trebishta Elbasan Golloborda Albanian
Tre67 m native 2007 Trebishta Tirana Golloborda middle school student Albanian
Tre68 m native 2008 Trebishta Tirana Golloborda middle school student Albanian
Tre69 f native 1964 Elbasan Elbasan Golloborda middle Albanian
Tre71 f native Trebisht Tirana Golloborda Albanian
Tre72 m native 2001 Trebishta Trebishta Golloborda higher Albanian student in Skopje
Tre73 f native 1972 Trebishta Trebishta Golloborda middle Albanian
Tre75 m native 1964 Trebishta Trebishta Golloborda middle Albanian
Tre76 m native 1992 Elbasan Trebishta Golloborda middle Albanian moved to Trebishta in 2003
Tre77 m native 1996 Trebishta Trebishta Golloborda middle Albanian graduated in Prilep
Tum01 m native 1956 Tuminec Tuminec Prespa (Tuminec) middle Albanian
Tum02 f native 2015 Tuminec Tuminec Prespa (Tuminec) preschool
Tum03 f native 1994 Tuminec Pustec Prespa (Tuminec) higher Albanian
Tum04 f native 1952 Tuminec Tuminec Prespa (Tuminec) middle school (unfinished) Albanian 2 grades
Tum05 f native 1977 Tuminec Tuminec Prespa (Tuminec) middle school (unfinished) Albanian 8 grades
Tum06 m native 1939 Tuminec Tuminec Prespa (Tuminec) middle school (unfinished) Albanian brother of Tum07, husband of Tum08 musician
Tum07 f native 1954 Tuminec Tuminec Prespa (Tuminec) middle school (unfinished) Albanian wife of Tum06
Tum08 f native 1940 Tuminec Tuminec Prespa (Tuminec) middle school (unfinished) Albanian sister of Tum06
Tum09 m native 1984 Tuminec Tuminec Prespa (Tuminec) middle school (unfinished) Albanian 7 grades
Tum10 m native 1964 Tuminec Tuminec Prespa (Tuminec) middle school (unfinished) Albanian 7 grades
Tum11 f native 1964 Tuminec Tuminec Prespa (Tuminec) middle school (unfinished) Albanian mother of Tum09 7 grades
Tum12 m native 1964 Tuminec Tuminec Prespa (Tuminec) middle school (unfinished) Albanian father of Tum09 5 grades
Tum13 f native 1964 Tuminec Tuminec Prespa (Tuminec) middle school (unfinished) Albanian 7 grades
Tum14 m native 2000 Tuminec Tuminec Prespa (Tuminec) middle school (unfinished) Albanian 7 grades
Tum15 m native 1940 Tuminec Tuminec Prespa (Tuminec) middle school (unfinished) Albanian cousin of Tum06, father of Tum16 farmer 7 grades
Tum16 m native 1960 Tuminec Tuminec Prespa (Tuminec) middle Albanian son of Tum15
Tum17 m native 1955 Tuminec Tuminec Prespa (Tuminec) middle school (unfinished) Albanian 7 grades
Tum18 f native 1931 Tuminec Tuminec Prespa (Tuminec) middle school (unfinished) Albanian 3 grades
Tum19 m native 1999 Elbasan Korça Prespa (Tuminec) middle Albanian
Tum20 m native 1947 Tuminec Korça Prespa (Tuminec) middle school (unfinished) Albanian
Tum21 f native 1952 Tuminec Korça Prespa (Tuminec) middle school (unfinished) Albanian
Tum22 f native 2010 Elbasan Korça Prespa (Tuminec) middle school student Albanian
Tum23 f native 1970 Tuminec Tuminec Prespa (Tuminec) middle Albanian vujna of Tum19, Kor01 farmer
Vor02 m native 1971 Tirana Tirana Golloborda higher Albanian priest origin from Vërnica

Transcripts

Our team manually prepared all transcripts. When possible, our trained local assistants, speakers of the dialects who also organized the family discussions, prepared the draft versions. Our editors (specialists in their respective philology) proofread these draft versions, following which Dr. Maxim Makartsev double-proofread them. If trained local assistants were unavailable, our editors prepared the transcripts. We used EXMARaLDA Partitur Editor to match the transcripts with the recordings. The original scripts for Albanian and other languages were used. Only transcripts in Slavic dialects were proofread, while transcripts for Albanian and other languages were not for contextual purposes. (Hence, the spelling ranges from standard to non-orthographical semi-phonetic spelling.)

Transcription

Our transcription system was as follows:

Vocalism

iü (Alb yll, ky)u
eă (schwa)o
a

Consonantism

Sonorants
m
lrn
jljnj

We did not distinguish between the flap and the drill r (cf. Albanian [r] as in arrë and [ɾ] as in erё) in Slavic.

Obstruents
pbfv
tdθð
cdzsz
čšž
ćđśź
kʼ (Macedonian ќ)gʼ (Macedonian ѓ)
kgh

Prosody

Stress was indicated by an apostrophe (' following the accentuated vowel), also in one-syllable words. This was necessary to show the position of the stress in phonetic words that included clitics, as well as other unstressed words (prepositions, etc.). Pauses were indicated only within narratives (i.e., if the speaker listed words or spoke separate sentences and fragments, so pauses were not necessary). A pause was indicated as follows: ((0.5)) for a 500 ms pause.

For several Štokavian transcripts (e.g., 33 (1), 33 (2), 33 (3)), we also tried to indicate the accent type, which corresponded to the conventions for the respective standards, as follows (symbols accepted in the BCMS national linguistic traditions are provided before the equal sign, with our symbols thereafter):

longshort
rising toneé = e':è = e'
falling toneȇ = eˋ:ȅ = eˋ

The automatic processing of the Štokavian data fully relied on the presented system being close to Gajica. For the automatic processing of Macedonian, the script was automatically recoded into standard Macedonian Cyrillic, following the procedure for Classla 2.1.1. The only place Cyrillic was used in the user interface was in the Macedonian lemmatization.

Annotation

We followed several steps for the annotation.

First, we formulated the rules to define the language of the respective word form based on the tags provided in EXMARaLDA Partitur Manager manually by the transcribers and editors and on language-specific scripts (e.g., Greek, Russian), symbols (e.g., ë, special for Albanian), and combinations of symbols (e.g., ll, rr, initial ng and mb, special or unique for Albanian).

Second, the respective parsers and taggers were applied depending on the language of the word form (see the table above). Following this, only those parts of the transcripts that the speakers uttered (not the researchers) in Slavic dialects were manually and semi-automatically checked, proofread, and edited. The parts of the transcripts that the researchers uttered or that speakers uttered in other language varieties were not proofread. In such cases, we kept the automatic annotation.

Third, lemmatization was manually checked (for Slavic); the lemmas automatically marked as Albanian were selectively checked and corrected, if needed. For Golloborda, Korça, and Prespa Macedonian, the lemmatization was performed in standard Macedonian—as the closest standard language structurally. For Myzeqe and Shijak, lemmatization was based on Ijekavian standards. For dialectal lexemes that did not exist in the respective standards, standard phonology was applied, resulting in the creation of dummy lemmata that followed standard phonology but cannot be found in standard dictionaries. The only function of these lemmata was to allow for the trans-dialectal search of word forms. If a standard cognate could not be established, we adopted any suitable word form attested in our transcripts.

Fourth, the morphological tags for Macedonian and Štokavian were harmonized, since Classla 2.1.1 uses slightly different MULTEXT-East conventions for varieties. The resulting tag set is provided below.

Fifth, the results of the morphological tagging were selectively checked. We focused on the word forms with the greatest homonymy and the lexemes with the most frequent tokens.

This is the beta-version of our corpus, so the manual editing of the morphological tags is ongoing. If you notice an error, please feel free to contact us. When working with the corpus, manual checking of the search results is highly recommended.

Considering the principally bilingual nature of our data and frequent code-switches, we would like to highlight two instruments here:

1) The corpus allows for searching sets of word forms that are specifically ordered (one after another or with one or more irrelevant word forms in between). You may compose the search entry by marking one of the word forms as Slavic (you can choose the dialect) and another as Albanian, which will show all cases of code-switches that follow your chosen parameters.

2) The special field Foreign includes all lexical matter borrowings and congruent lexicalizations from Albanian. (They cannot be formally distinguished since both types have Albanian stems and Slavic inflectional morphology.)

We also distinguished direct speech (tag OWN) and quotations (XENO), while several other tags provide additional information about the intonation and context of the interview (((LAUGH)), ((COUGH)), ((NOISE))).

Metadata

  • Transcript ID
  • Year and location of the recording
  • Sub-corpus (dialect)
  • Code of the speaker
  • Code of the researcher who participated in the recording and transcribing of the interview
  • The type of interview (whether the researchers were present or absent)
  • The speaker’s birth place, birth year, gender, and occupation; other sociolinguistic data when relevant; familial relations when relevant
  • Current place of residence of the speaker
  • Genre

List of transcripts

See full list

Transcript_ID Dialectal type Dialect Place of recording Genre Researcher attending Settlement type Fully accentuated
2 Štokavian Shijak Borake Interview yes Village no
10 Balkan Slavic Korça Boboshtica Interview yes Village no
12 Balkan Slavic Korça Boboshtica Interview yes Village no
13 Balkan Slavic Korça Boboshtica Frog yes Village no
14 Balkan Slavic Korça Boboshtica Frog yes Village no
15 Balkan Slavic Korça Boboshtica Frog yes Village no
16 Balkan Slavic Golloborda Tirana Interview yes City no
17 Balkan Slavic Golloborda Tirana Interview yes City no
18 Balkan Slavic Golloborda Tirana Interview yes City no
19 Balkan Slavic Golloborda Tirana Interview yes City no
20 Balkan Slavic Golloborda Elbasan Interview yes City no
22 Balkan Slavic Golloborda Elbasan Frog yes City no
24 Balkan Slavic Prespa Pustec Interview yes Village no
25 Balkan Slavic Prespa Korça Interview yes City no
26 Balkan Slavic Golloborda Elbasan Interview yes City no
27 Balkan Slavic Korça Korça Interview yes City no
28 Balkan Slavic Korça Korça Interview yes City no
30 Balkan Slavic Korça Korça Frog yes City no
31 Balkan Slavic Korça Korça Frog yes City no
32 Balkan Slavic Korça Korça Frog yes City no
35 Štokavian Shijak Borake Frog yes Village no
36 Balkan Slavic Golloborda Elbasan Frog yes City no
37 Balkan Slavic Prespa Elbasan Interview yes City no
38 Balkan Slavic Golloborda Tirana Frog yes City no
39 Balkan Slavic Golloborda Tirana Interview yes City no
40 Balkan Slavic Golloborda Elbasan Interview yes City no
41 Balkan Slavic Golloborda Elbasan Interview yes City no
42 Balkan Slavic Golloborda Tirana Interview yes City no
44 Balkan Slavic Golloborda Elbasan Frog yes City no
45 Balkan Slavic Golloborda Trebisht Interview yes Village no
48 Balkan Slavic Golloborda Trebisht Interview yes Village no
49 Balkan Slavic Golloborda Trebisht Interview yes Village no
50 Balkan Slavic Golloborda Trebisht Interview yes Village no
51 Balkan Slavic Golloborda Trebisht Interview yes Village no
52 Balkan Slavic Golloborda Trebisht Interview yes Village no
54 Balkan Slavic Golloborda Trebisht Interview yes Village no
55 Balkan Slavic Golloborda Trebisht Interview yes Village no
56 Balkan Slavic Golloborda Trebisht Interview yes Village no
57 Balkan Slavic Golloborda Trebisht Interview yes Village no
58 Balkan Slavic Golloborda Trebisht Interview yes Village no
59 Balkan Slavic Golloborda Trebisht Interview yes Village no
60 Balkan Slavic Golloborda Trebisht Interview yes Village no
61 Balkan Slavic Golloborda Trebisht Interview yes Village no
62 Balkan Slavic Golloborda Trebisht Interview yes Village no
63 Balkan Slavic Prespa Pustec Family no Village no
65 Balkan Slavic Prespa Tuminec Family no Village no
67 Balkan Slavic Prespa Tuminec Family no Village no
70 Štokavian Shijak Borake Interview yes Village yes
72 Štokavian Shijak Borake Interview yes Village yes
73 Balkan Slavic Prespa Korça Frog yes City no
74 Balkan Slavic Prespa Korça Interview yes City no
75 Balkan Slavic Prespa Korça Frog yes City no
76 Štokavian Shijak Borake Interview yes Village no
77 Štokavian Shijak Borake Interview yes Village no
78 Balkan Slavic Prespa Elbasan Frog yes City no
79 Balkan Slavic Prespa Korça Frog yes City no
80 Balkan Slavic Prespa Korça Interview yes City no
81 Balkan Slavic Prespa Korça Interview yes City no
82 Balkan Slavic Prespa Pustec Frog yes Village no
83 Balkan Slavic Prespa Globoçeni Frog yes Village no
84 Balkan Slavic Prespa Globoçeni Interview yes Village no
86 Balkan Slavic Prespa Pustec Interview yes Village no
90 Balkan Slavic Prespa Globoçeni Interview yes Village no
91 Balkan Slavic Prespa Dolna Gorica Interview yes Village no
94 Balkan Slavic Prespa Pustec Interview yes Village no
96 Balkan Slavic Prespa Korça Interview yes City no
98 Balkan Slavic Prespa Korça Interview yes City no
99 Balkan Slavic Golloborda Trebisht Family no Village no
100 Balkan Slavic Golloborda Trebisht Family no Village no
101 Balkan Slavic Golloborda Trebisht Family no Village no
104 Balkan Slavic Golloborda Trebisht Frog no Village no
105 Balkan Slavic Golloborda Trebisht Family no Village no
106 Balkan Slavic Golloborda Trebisht Family no Village no
108 Balkan Slavic Golloborda Trebisht Frog no Village no
109 Balkan Slavic Golloborda Trebisht Family no Village no
110 Balkan Slavic Golloborda Trebisht Family no Village no
111 Balkan Slavic Golloborda Trebisht Family no Village no
112 Balkan Slavic Golloborda Trebisht Frog no Village no
113 Balkan Slavic Golloborda Trebisht Frog no Village no
114 Balkan Slavic Golloborda Elbasan Frog no City no
116 Balkan Slavic Golloborda Elbasan Frog no City no
117 Balkan Slavic Golloborda Elbasan Family no City no
118 Balkan Slavic Golloborda Tirana Frog no City no
119 Balkan Slavic Golloborda Tirana Family no City no
121 Balkan Slavic Golloborda Elbasan Frog no City no
122 Balkan Slavic Golloborda Elbasan Family no City no
123 Balkan Slavic Golloborda Elbasan Frog no City no
124 Balkan Slavic Golloborda Tirana Frog no City no
125 Balkan Slavic Golloborda Elbasan Frog no City no
126 Balkan Slavic Golloborda Tirana Family no City no
127 Balkan Slavic Golloborda Elbasan Frog no City no
128 Balkan Slavic Golloborda Elbasan Family no City no
130 Balkan Slavic Golloborda Tirana Frog no City no
131 Balkan Slavic Golloborda Elbasan Family no City no
133 Balkan Slavic Golloborda Tirana Frog no City no
134 Balkan Slavic Golloborda Tirana Frog no City no
135 Balkan Slavic Golloborda Tirana Family no City no
136 Balkan Slavic Golloborda Tirana Frog no City no
137 Balkan Slavic Golloborda Tirana Family no City no
138 Balkan Slavic Golloborda Tirana Frog no City no
139 Balkan Slavic Golloborda Tirana Family no City no
140 Balkan Slavic Golloborda Elbasan Family no City no
145 Balkan Slavic Prespa Tuminec Family no Village no
146 Štokavian Myzeqe Hamil Interview yes Village no
150 Štokavian Myzeqe Fier Interview yes City no
151 Štokavian Myzeqe Rreth Libofsha Interview yes Village no
152 Štokavian Myzeqe Rreth Libofsha Interview yes Village no
153 Štokavian Myzeqe Rreth Libofsha Interview yes Village no
154 Štokavian Myzeqe Rreth Libofsha Interview yes Village no
155 Štokavian Myzeqe Rreth Libofsha Interview yes Village no
156 Štokavian Myzeqe Rreth Libofsha Interview yes Village no
157 Štokavian Myzeqe Rreth Libofsha Interview yes Village no
158 Štokavian Myzeqe Rreth Libofsha Frog yes Village no
159 Štokavian Myzeqe Rreth Libofsha Interview yes Village no
160 Štokavian Myzeqe Rreth Libofsha Interview yes Village no
161 Štokavian Shijak Sukth Interview yes Village no
162 Štokavian Shijak Sukth Interview yes Village no
163 Štokavian Shijak Sukth Interview yes Village no
164 Štokavian Shijak Sukth Interview yes Village no
165 Štokavian Shijak Borake Frog yes Village no
166 Štokavian Shijak Borake Interview yes Village no
168 Balkan Slavic Prespa Dolna Gorica Interview yes Village no
169 Balkan Slavic Prespa Dolna Gorica Interview yes Village no
170 Balkan Slavic Prespa Dolna Gorica Interview yes Village no
171 Balkan Slavic Prespa Dolna Gorica Interview yes Village no
172 Balkan Slavic Prespa Dolna Gorica Interview yes Village no
173 Balkan Slavic Prespa Dolna Gorica Frog yes Village no
174 Balkan Slavic Prespa Dolna Gorica Interview yes Village no
175 Balkan Slavic Prespa Dolna Gorica Interview yes Village no
176 Balkan Slavic Prespa Dolna Gorica Frog yes Village no
177 Balkan Slavic Prespa Pustec Interview yes Village no
178 Balkan Slavic Prespa Tuminec Interview yes Village no
179 Balkan Slavic Prespa Tuminec Interview yes Village no
180 Balkan Slavic Prespa Tuminec Interview yes Village no
181 Balkan Slavic Prespa Tuminec Interview yes Village no
182 Balkan Slavic Prespa Tuminec Interview yes Village no
183 Balkan Slavic Prespa Tuminec Interview yes Village no
184 Balkan Slavic Prespa Korça Family no City no
186 Štokavian Myzeqe Fier Interview yes City no
187 Štokavian Myzeqe Fier Frog yes City no
11 (1) Balkan Slavic Korça Boboshtica Interview yes Village no
11 (2) Balkan Slavic Korça Boboshtica Interview yes Village no
185 (1) Štokavian Myzeqe Fier Interview yes City no
185 (2) Štokavian Myzeqe Fier Interview yes City no
29 (1) Balkan Slavic Korça Korça Interview yes City no
29 (2) Balkan Slavic Korça Korça Interview yes City no
29 (3) Balkan Slavic Korça Korça Interview yes City no
3 (1) Štokavian Shijak Borake Interview yes Village no
3 (2) Štokavian Shijak Borake Interview yes Village no
33 (1) Štokavian Myzeqe Rreth Libofsha Interview yes Village yes
33 (2) Štokavian Myzeqe Rreth Libofsha Interview yes Village yes
33 (3) Štokavian Myzeqe Rreth Libofsha Interview yes Village yes
33 (4) Štokavian Myzeqe Rreth Libofsha Interview yes Village yes
43 (1) Balkan Slavic Golloborda Elbasan Interview yes City no
43 (2) Balkan Slavic Golloborda Elbasan Interview yes City no
46 (1) Balkan Slavic Golloborda Trebisht Interview yes Village no
46 (2) Balkan Slavic Golloborda Trebisht Interview yes Village no
5 (1) Štokavian Shijak Borake Interview yes Village no
5 (2) Štokavian Shijak Borake Interview yes Village no
5 (3) Štokavian Shijak Borake Interview yes Village no
5 (4) Štokavian Shijak Borake Interview yes Village no
53 (1) Balkan Slavic Golloborda Trebisht Interview yes Village no
53 (2) Balkan Slavic Golloborda Trebisht Interview yes Village no
53 (3) Balkan Slavic Golloborda Trebisht Interview yes Village no
85 (1) Balkan Slavic Prespa Dolna Gorica Interview yes Village no
85 (2) Balkan Slavic Prespa Dolna Gorica Interview yes Village no
87 (1) Štokavian Shijak Sukth Interview yes Village no
87 (2) Štokavian Shijak Sukth Frog yes Village no
87a Štokavian Shijak Sukth Frog yes Village no
88 (1) Balkan Slavic Prespa Pustec Interview yes Village no
88 (2) Balkan Slavic Prespa Pustec Interview yes Village no
89 (1) Balkan Slavic Prespa Pustec Interview yes Village no
89 (2) Balkan Slavic Prespa Pustec Interview yes Village no
92 (1) Balkan Slavic Prespa Pustec Interview yes Village no
92 (2) Balkan Slavic Prespa Pustec Interview yes Village no
93 (1) Balkan Slavic Prespa Dolna Gorica Interview yes Village no
93 (2) Balkan Slavic Prespa Dolna Gorica Interview yes Village no
95 (1) Štokavian Shijak Borake Interview yes Village no
95 (2) Štokavian Shijak Borake Interview yes Village no
95 (3) Štokavian Shijak Borake Interview yes Village no
188 Štokavian Shijak Borake Family no Village no

Tag set for SDAs

Originally, the tag set was based on MULTEXT-East morphosyntactic specifications. Korça Macedonian, Prespa Macedonian, and Golloborda Macedonian were based on Macedonian specifications; Myzeqe Štokavian and Shijak Štokavian were based on Serbo-Croatian specifications.

We introduced several changes to 1) harmonize the Macedonian and Štokavian parsers and taggers that otherwise followed somewhat different principles and conventions; 2) adapt the tag set to the terminology most widely used in Slavic linguistics (e.g., the use of the term “imperfective aspect” instead of “progressive aspect”); 3) unify all other possible idiosyncrasies (e.g., MULTEXT-East morphosyntactic specifications for Serbo-Croatian do not have verbal aspects, so these had to be introduced for our corpus).

The tag set for the Albanian language section was developed by Maria Morozova, Alexander Rusakov, and Timofey Arkhangelskiy for the Albanian National Corpus and can be found here. Albanian tags are preceded by the prefix sq: here to avoid confusion with homonymous Slavic tags.

The grammatical features of the words in the corpus are marked with short tags. In tags, abbreviations are capitalized, while full words are not.

See the tag set

Here is the full list of Slavic tags and other abbreviations:

Tag Reading Comments and examples
1 first person
2 second person
3 third person
Acc accusative case
Adm admirative (treated as mood for the simplicity of the description)
ADJ adjective (POS)
ADP adposition = preposition (POS)
ADV adverb (POS)
AdvPronDem demonstrative
AdvPronEmp emphatic
AdvPronInt interrogative
AdvPronNeg negative (adverbs, pronouns)
AdvPronRel relational
Aor aorist
Aux auxiliary
biaspectual biaspectual verb
Card cardinal (numerals)
CCONJ coordinating conjunction (POS)
Cmp comparative degree by adjectives and adverbs
Cnd conditional mood
Cnv converb (verbal adverb)
Dat dative case
Def definite form (of adjectives in Štokavian, of various POS in Macedonian)
Dist distal form (by deictic articles)
Fem feminine grammatical gender
Fin finite form
Foreign the word form uses an Albanian stem but Slavic morphology (by borrowings and congruent lexicalizations)
Fut future Štokavian future I forms can be realized as synthetic (došću) or analytic (umrijet ću). The latter type is encoded as Linf or Sinf + a personal form of htjeti
Gen genitive case
Imper imperative
Imperf imperfect tense
imperfective imperfective aspect
Indef indefinite form of adjectives
Indic indicative
Ins instrumental case
INTJ interjection
Linf long infinitive kazati
Loc locative case
Lpart l-participle pisao, rekla
main general type of verbs that are not auxiliaries, modals, or phasal verbs
Masc masculine grammatical gender
Mod modal
Native words not tagged as Foreign
NegPol negative polarity nema
Neut neutral grammatical gender
Nom nominative case
NOUN noun (POS)
Npart n-/t- participle zaboravljen
NUM numeral (POS)
NumType numeral type (cardinal, ordinal, sets, total)
Obl oblique case
Ord ordinal (numerals)
Part participles (except for Lparticiples and Nparticiples) buduća
PART particle (POS)
perfective perfective verbal aspect
phasal phasal verbs
Plur plural number
Poss words that have posessive semantics (e.g. posessive adjectives) babin
Pres present tense
PRON pronoun (POS)
PROPN proper noun (POS)
Prox proximal form (by deictic articles)
Prs present tense
Reflex reflexive
SCONJ subordinating conjunction
Sets sets as a type of numerals
Sinf short infinitive kazat
Sing singular number
Tot total as a type of numerals
VERB verb (POS)
Vnoun verbal noun slučenje
Voc vocative form (subordinated to cases for the sake of simplicity)
X word forms non-classified for POS

Frequently Asked Questions

— What is the Corpus of Slavic dialects in Albania?

This is a language corpus or collection of non-adapted transcripts of interviews done in Slavic dialects that are spoken in Albania. Each word form in these dialects included in the corpus are enriched with additional linguistic information or annotations. We also have a user-friendly interface that allows for writing search queries.

— Who needs corpora?

Corpora are used by linguists. The search engines and annotations of corpora are designed to allow for easily making linguistic queries such as “find all pronouns in the accusative case” or “find all forms of the word mačka followed by a verb” or “find all instances of a noun followed by an adjective” so that you can retrieve relevant information from the provided linguistic varieties in seconds. Further analyses of this type of data allow linguists to determine how linguistic varieties have changed, how Albanian has influenced these varieties, what the limits of variation are, or whether there are any new and interesting linguistic phenomena that are not found in Macedonian and Štokavian dialects that have had no contact with Albanian.

Aside from linguists, corpora can be useful tools for language teachers, language learners, and even native speakers.

A corpus documents linguistic varieties in a given period. For example, one of the varieties included (Korça Macedonian) appeared to have gone extinct during our project (or its last speakers became unavailable to the researchers due to their old age). To the best of our knowledge, our corpus includes the last speech examples of this dialect available. It preserves this dialect and other included varieties for future generations and can be used by language activists for language revitalization.

— Can I use the corpus for other things beyond purely linguistic research?

The Corpus of Slavic dialects in Albania makes full transcripts of our recordings available. Aside from merely linguistic interests, the content of the transcripts can also be analyzed since the transcripts are so diverse and include many narratives containing oral history, identity, and anonymized personal biographies. Our transcripts also include much ethnographic and ethnolinguistic information on the traditional culture of the communities, which can be relevant for ethnolinguists, ethnographers, and members of the communities. There are also many examples of oral folk traditions (songs, tales, proverbs, etc.) available for researchers and the general public.

— Can I use the corpus as a dictionary?

You might not be able to use this corpus like you would a traditional dictionary because it does not provide translations or explanations of the included words. You may, however, discover in which context the word is used, which you can then use to clarify the word’s meaning.

— What is morphological annotation, and how is it obtained?

Our corpus was lemmatized and morphologically annotated. Lemmatization means that each word in the texts was annotated with its lemma, i.e., its dictionary or citation form. Morphological annotation means that the grammatical features of each word were annotated, including its part of speech, number, case, tense, etc. Since the corpus was too large for manual annotations, it was annotated automatically with programs called morphological analyzers.

We used analyzers compiled for standard Macedonian since it is closest to Golloborda, Korça, and Prespa Macedonian structurally, as well as analyzers compiled for Štokavian-based standard languages for Myzeqe and Shijak Štokavian (mostly the Croatian analyzer since it could account for dialectal variations in phonology and morphology within the Štokavian dialects). The language denominators for the respective analyzers do not make any claims about the identities of our speakers and were only used as references in external sources.

The results of the automatic annotation were partially proofread and edited manually. Our corpus still has homonymy, i.e., when one word form may have several possible morphological analyses. For example, ja in Macedonian dialects can mean ‘I’ (first-person singular personal pronoun in the nominative case), ‘her’ (third-person singular feminine personal pronoun in the accusative case), ‘here’ (a deictic particle), etc. Hence, when looking for anything within the corpus, you will receive false positive results. Manually checking the data you find in the corpus is thus strongly recommended.

Acknowledgments

This corpus is the main research instrument developed for the project “Contact-induced language change in situations of non-stable bilingualism—Its limits and modelling: Slavic (social) dialects in Albania,” funded by the DFG (German Research Foundation), project number 8750/1-1 (October 16th, 2019–April 30th, 2024). The principal investigator was Dr. Maxim Makartsev. See List of project-related publications.

The concept, development, and realization of this project would not have been possible without the constant support of Prof. Dr. Gerd Hentschel, my deepest gratitude to whom words cannot express. I am deeply indebted to Prof. Dr. Jan Patrick Zeller and my other colleagues from the Institute for Slavistics (Carl von Ossietzky Universität Oldenburg) for their support and thorough feedback on my project during its various stages.

Authors

The corpus was developed and is maintained by:

The current version of the corpus uses the platform tsakorpus developed by Dr. Timofey Arkhangelsky. It is stored on the server of Carl von Ossietzky University Oldenburg.

I am deeply grateful to Dr. Elena Uzeneva, my cooperation partner (in the period 01.01.2020–31.03.2022), without whom this project would not have been possible.

Several field trips were undertaken to collect the speech samples that were included in the corpus. In 2010–2019, Dr. Maxim Makartsev organized these trips using his own resources (see his legacy page on the site of his previous host institution for publications based on those data). The participants in these fieldtrips were:

  • Dr. Mikhail Chivarzin, Moscow—Shenzhen (MC)
  • Alexandra Chivarzina, Moscow—Shenzhen (AC)
  • Renata Hamidullina, Perm—Vienna (RH)
  • Marina Mihajlova, Sofia—Calgary (MMI)

In 2020–2022, Dr. Maxim Makartsev and Dr. Elena Uzeneva organized the field trips within the framework of the aforementioned project funded by the DFG. The participants in these fieldtrips were:

The photo on the start page was taken in Trebisht / Требишта, Albania, by Aino Väänänen, an independent documentary photographer, during our joint research field trip for the aforementioned project in 2020 and was used with her kind permission.

The transcripts for the corpus were prepared by:

  • Hristina Angeleska, Prilep
  • Bojana Damnjanović, Helsinki
  • Pavel Falaleev, Helsinki
  • Đorđe Genović, Belgrade
  • Violeta Jordanova, Skopje
  • Dr. Natalia Kikilo, Moscow
  • Dr. Maxim Makartsev, Oldenburg
  • Milan Milenović, Belgrade
  • Ekaterina Panova, Saint Petersburg
  • Uliana Putilina, Moscow
  • Anđela Redžić, Belgrade
  • Maria Stryszewska, Wrocław;
  • Ekaterina Titova, Moscow.

We are deeply grateful to our speakers and local assistants whose hard work and passion made it possible to make this corpus available. We cannot disclose their names and personal details for their protection.

References

Berman, Ruth A., Dan I. Slobin, Sven Stromqvist, and Ludo T. Verhoeven. 1994–2004. Relating Events in Narrative. Hillsdale, N.J. L. Erlbaum Associates.

Bojović, Jovan R., ed. 1991. Stanovništvo slovenskog porijekla u Albaniji : zbornik radova sa međunarodnog naučnog skupa održanog u Cetinju 21, 22. i 23. juna 1990. Titograd: Stručna knjiga.

Čëxa, Oksana V. 2009. “Novogrečeskaja leksika narodnoj astronomii v sopostavlenii s balkanoslavjanskoj: Luna i lunnoe vremja (ėtnolingvističeskij aspekt).” Ph.D., Institute of Slavic Studies, Russian academy of sciences. https://inslav.ru/event/chyoha-oksana-vladimirovna-novogrecheskaya-leksika-narodnoy-astronomii-v-sopostavlenii-s.

Cvetanovski, Goce. 2010. Govorot na makedoncite vo Mala Prespa: zapadnoprespanski govor. Skopje: Institut za makedonski jazik “Krste Misirkov”.

Hentschel, Gerd, and Jan P. Zeller. 2013. “Gemischte Rede, gemischter Diskurs, Sprechertypen: Weißrussisch, Russisch und gemischte Rede in der Kommunikation weißrussischer Familien.” In Wiener Slawistischer Almanach, edited by Aage A. Hansen-Löve and Tilmann Reuther, 127–55 70. München, Berlin, Wien: Peter Lang.

Makartsev, Maxim. 2017. “Ėtjudy k balkanskomu bestiariju: Kukuška.” Živaja starina 95 (3): 46.

———. 2023. “Razvoj balkanoslavenskoga tipa futura u štokavskim iseljeničkim dijalektima u Albaniji i jezički kontakti.” Književni jezik (34): 41–69.

Makartsev, Maxim, and Natalia Kikilo. 2022. “Some Tendencies in the Morphosyntax of the Migrational Shtokavian Dialects in Albania (Shijak and Myzeqe) And Slavic-Albanian Language Contact.” Slavic World in the Third Millennium 17 (1-2): 120–41. doi:10.31168/2412-6446.2022.17.1-2.07.

Mayer, Mercer. 1969. Frog, Where Are You? Sequel to a Boy, a Dog and a Frog. New York: Dial Books for Young Readers (a division of Penguin Putnam Inc.).

Mazon, André. 1936. Documents, contes et chansons slaves de l’Albanie du Sud. Bibliothèque d’études balkaniques 5. Paris: Librarie Droz.

Mazon, André, and Maria Filipova-Bajrova. 1965. Documents slaves de l’Albanie du Sud: II. Pièces complémentaires. Bibliothèque d’études balkaniques 8. Paris: Institut d’études slaves.

Plotnikova, Anna A. 2009. Materialy dlja ėtnolingvističeskogo izučenija balkanoslavjanskogo areala. 2, revised. Moskva: Institut slavjanovedenija RAN.

Seliščev, Afanasij M. 1931. Slavjanskoe naselenie v Albanii (s illjstracijami v tekste i s kartoju Albanii). Sofia.

Sobolev, Andrey N., and Aleksandr Novik. 2013. Golo Bordo (Gollobordë), Albanija: Iz materialov balkanskoj ėkspedicij RAN i SPbGU 2008-2010 gg. Materialien zum Südosteuropasprachatlas Bd. 6. Sankt-Peterburg: Nauka.

———. 2017. Gollobordë (Golo Bordo), Shqipëri: Nga materialet e ekspeditës ballkanike të AShR-së dhe UShSt-P-së në vitet 2008-2010. Translated by Ligor Cullufe. Tiranë: Botimet Toena.

———. 2018. Golo Brdo: Od materijalite na balkanskata ekspedicija na RAN i SPbDU vo 2008-2010 godina. Materialien zum Südosteuropasprachatlas Band 6. Skopje, Sankt Peterburg: Univerzitet "Sv. Kiril i Metodij"; Institut za makedonski jazik "Krste Misirkov"; "Nauka".

Steinke, Klaus, and Xhelal Ylli. 2007. Die slavischen Minderheiten in Albanien (SMA): 1. Teil. Prespa-Vërnik-Boboshtica. Slavistische Beiträge 458. München: Otto Sagner.

———. 2008. Die slavischen Minderheiten in Albanien (SMA): 2. Teil. Golloborda-Herbel-Kërçishti i Epërm. Slavistische Beiträge 462. München: Otto Sagner.

———. 2010. Die slavischen Minderheiten in Albanien (SMA): 3. Teil. Gora. Slavistische Beiträge 474. München: Otto Sagner.

———. 2013. Die slavischen Minderheiten in Albanien (SMA): 4. Teil. Vraka-Borakaj. Slavistische Beiträge 491. München, Berlin: Sagner.

Tončeva, Veselka. 2014. Našencite v Albanija: Istorija, ezik, tradicii. Sofia: Ongăl.

Vidoeski, Božidar. 1998. Dijalektite na makedonskiot jazik. Dijalektite na makedonskiot jazik 1. Skopje: Makedonska Akademija na naukite i umetnostite.

Ylli, Xhelal. 1997. Das slavische Lehngut im Albanischen. Teil 1 : Lehnwörter. Slavistische Beiträge. Digitale Ausgabe 350. München: Verlag Otto Sagner.

———. 2000. Das slavische Lehngut im Albanischen. Teil 2 : Ortsnamen. Slavistische Beiträge. Digitale Ausgabe 395. München: Verlag Otto Sagner.

Contact


If you have questions, would like to propose collaboration, or noticed an error in the corpus, please contact Dr. Maxim Makartsev.