“Jazikot eje kaj čovekot. Eje zdraf, ako ese ljuditi zdravi. Eje živ, ako este i ljuditi živi. Umbri čoveko, umbri i jaziko. Mjene mi grjej žl’e ščo jaziko naš umbre. Toko umbreje i ljuditi, i koj ža zborvi? Nema koj da zborvi...”
“Languages are like humans. They are healthy if the people are healthy. They are alive if the people are alive. If a person dies, a language dies. I feel sorry that our language died, but the people died as well, and who is going to speak [the language]? Nobody is…”
These are the main parameters of the corpus:
Dialect | Rural locations | Urban locations | Size, thousand wordsExcluding the utterances of the researchers | Morphological analysis |
---|---|---|---|---|
Korça Macedonian | Boboshtica | Korça | 34.0 | Classla 2.1.1 for Macedonian, partial manual postprocessing |
Prespa Macedonian | Pustec, Gorna Gorica, Dolna Gorica, Shulin | Elbasan, Korça | 171.3 | Classla 2.1.1 for Macedonian, partial manual postprocessing |
Golloborda Macedonian | Trebisht, Vërnica, Malestreni | Durrës, Elbasan, Tirana | 239.7 | Classla 2.1.1 for Macedonian, partial manual postprocessing |
Myzeqe Štokavian | Rreth Libofsha, Petova | Fier | 58.8 | Classla 2.1.1 for Serbo-Croatian, partial manual postprocessing |
Shijak Štokavian | Borake, Koxhas | Shijak, Sukth | 68.8 | Classla 2.1.1 for Serbo-Croatian, partial manual postprocessing |
Albanian | All of the above | All of the above | 34.7 | uniparser-albanian |
Other languages: Bulgarian, English, French, German, Greek, BCMS, Italian, Russian, Turkish | 4.5 | not analyzed | ||
Total | 611.8 |
This project is not the first study of Slavic dialects in Albania (SDAs), but we did consider varieties that have never been studied (e.g., Slavic speech in urban Albanian settings; Myzeqe Štokavian).
Starting from the groundbreaking monograph Slavic populations of Albania by A. Seliščev (1931), there have been several publications on SDAs that covered both the history of the Slavic population in this country (e.g., Ylli’s (1997, 2000) monograph on Slavic borrowings in Albanian toponymics) and its current state (Bojović 1991; Tončeva 2014; Vidoeski 1998). Of the utmost importance is the four-volume work “Die slavischen Minderheiten in Albanien” by Steinke and Ylli (2007, 2008, 2010, 2013), supported by a Deutsche Forschungsgemeinschaft (DFG) grant between 2002 and 2011.
In the selected publications listed, as well as in those dedicated to separate dialects (see the outline of dialects below), you can find descriptions of the language systems of the SDAs, information about the current status of the communities that use the SDAs, and dialectal transcripts.
The labels for the language varieties included in the corpus do not make any claims about the national, ethnic, or other identities of the speakers; they are purely provided for an orientation in terms of the respective dialectologies. The labels are not necessarily those that the speakers used, either. In fact, some speakers do not use any labels at all, while others use a variety of labels, often non-terminologically. The language issue within some of the ethnolinguistic minorities in Albania is seriously politicized; however, this corpus does not carry any political claims of any political organization, party, group of individuals, or state, etc. The language labels used in the external sources quoted here are as in the original for identification purposes only.
Five dialects were chosen for this project, as shown on the Google Map.
Golloborda Macedonian is a peripheral Balkan Slavic dialect that continues West Macedonian Debar dialects in Albanian territory. It is spoken in 15 villages in the Albanian regions of Dibra and Elbasan, as well as in migrant communities in the cities of Durrës, Tirana, and Elbasan. It has been estimated that it has more than 7,000 speakers in total. In its rural centers, this community has been studied thoroughly by a team of researchers from the Institute of Linguistic Studies (Russian Academy of Sciences), Saint Petersburg State University, and Peter the Great Museum of Anthropology and Ethnography (Kunstkammer); their research resulted in a valuable monograph that was translated into Albanian and Macedonian. Notably, our corpus-based research focused on the sociolinguistic variation and changes within this dialect, specifically between its rural and urban centers, so our methods and data differ from those in the research of our peers from Saint Petersburg.
Selected literature: Steinke & Ylli (2008); Sobolev & Novik (2013, 2017, 2018).
Korça Macedonian is an apparently extinct Balkan Slavic island dialect (structurally close to the dialectical area of Southeastern Macedonian). The corpus includes speech samples from the last six speakers (three of whom lived in the village of Boboshtica and three of whom lived in the town of Korça in Southeastern Albania but were originally from Drenova). Despite the community being so small, this dialect was crucial for the project, as it has been subjected to the most prolonged and intensive Albanian influence. However, family discussions could not be organized in this community.
Selected literature: Mazon (1936); Mazon and Filipova-Bajrova (1965); Steinke and Ylli (2007).
Prespa Macedonian is a peripheral Balkan Slavic dialect that continues the West Macedonian Ohrid-Prespa dialects on the Albanian side of Great Prespa Lake, transitional to Southeastern Macedonian. According to estimates, it has around 4,500 speakers in nine villages of the region and two large towns, namely Korça and Bilisht.
Selected literature: Steinke and Ylli (2007); Cvetanovski (2010).
Spoken in several quarters in Fier and several villages around this town, Myzeqe Štokavianan is a Štokavian island dialect spoken among recent (1920s) migrants from the Sandžak region (a Novi Pazar-Sjenica dialect of the Zeta-Sjenica dialectal zone) of what is now Southwestern Serbia and the bordering region of Montenegro.
Selected literature: Makartsev and Kikilo (2022); Makartsev (2023).
Spoken in the village of Borake and its satellite village of Koxhas, Shijak Štokavian is a Štokavian island dialect spoken among relatively recent (from the 1880s) migrants from the Mostar region in what is now Bosnia and Herzegovina (a central Herzegovinian subdialect of the East Bosnian dialectal zone, spoken in the Mostar—Čapljina—Stolac triangle). It is spoken by 150 to 220 families in both villages. Speakers of this dialect also live in the town of Shijak and Sukth.
Selected literature: Steinke and Ylli (2013); Makartsev and Kikilo (2022); Makartsev (2023)
These dialects have varying degrees of structural affinity with Albanian due to their differing connections to the Balkan sprachbund. Structurally, the closest to Albanian are the Balkan Slavic dialects of Korça, Golloborda, and Prespa. The Štokavian dialects of Myzeqe and Shijak are not included in the Balkan sprachbund and show less structural affinity with Albanian.
The selected dialects do not represent all the SDAs. One of the most complete lists of SDAs can be found in Steinke and Ylli’s monograph. However, they contain the variety of elements and parameters that have caused the diversity in the SDAs.
Two of the dialects are Štokavian (Myzeqe and Shijak), but their speakers have differing ethnopolitical and linguistic orientations: Our interviewees in Shijak usually articulated their Bosniak identity; Myzeqe speakers usually clarified their Bosniak or Serbian identity. Three dialects are Balkan Slavic (Golloborda, Korça, Prespa). The orientation of the speakers of these dialects toward a standard language (Macedonian or Bulgarian) is usually individual. The ethnopolitical and linguistic orientations mentioned by the speakers were thus not interpreted as making any political claims, but they allowed us to thematize the orientation of the speakers toward one of the standard Southern Slavic languages and better explain certain features in their speech.
Four of the communities have a rural (more conservative) and an urban (less conservative) center (except Korça Macedonian, whose number of speakers did not allow us to construct this opposition). For Shijak, the labor activity of the speakers was important, especially in terms of whether their jobs were connected to work at the national road services. (Such workers have daily contact with the language of TIR truck drivers who speak BCMS.) The opposition of rural versus urban was not relevant for the Shijak data because of the short distance between the settlements and the small size of the towns of Sukth and Shijak. The city of Durrës, which is where many of the dialectal speakers work daily, is also located too close to Shijak to allow for the shaping of a significant urban colony with distinct features.
Religion was another factor that we considered might influence linguistic identity choices (cf. the linguo-confessional situation in Bosnia and Herzegovina and regions of Montenegro and Serbia populated by a Štokavian-speaking but traditionally Muslim population) since it is relevant to numerous distinctive features of the traditional culture. All Myzeqe and Shijak Štokavian speakers that we interviewed culturally and traditionally belong to Sunni Islam. Among the Balkan Slavic communities, all our Korça and Prespa speakers culturally belong to Orthodox Christianity, while Golloborda is heterogenous with the domination of Sunni Islam.
Topic | Type | Bibliographical reference |
---|---|---|
1. Narratives and memorates | unstructured | – |
2. Ethnographical and ethnolinguistic interviews: | ||
- 2.1. Calendar, rites of passage (birth, marriage, death), demonology | semi-structured | (Plotnikova 2009, see the online publication) |
- 2.2. Rites and beliefs connected to the moon | semi-structured | (Čëxa 2009) |
- 2.3. Rites and beliefs connected to the cuckoo | semi-structured | (Makartsev 2017) |
3. Frog where are you? | (Mayer 1969, see preview; Berman et al. 1994–2004) | |
- 3.1. Conducted by the researchers | graphic | |
- 3.2. Conducted by trained local assistants | graphic | |
4. Family talks | unstructured | (Hentschel and Zeller 2013) |
The narratives and memorates (T. 1) were unstructured discussions about the oral history and current problems of a given community that also provided insights into the identity and politics of memory of the community. The researchers led these discussions.
The ethnographical and ethnolinguistic interviews (T. 2) were conducted to collect ethnographical and ethnolinguistic information. They comprised the informants’ answers to our questions and covered various aspects of the traditional culture. We mainly followed the structure of the questionnaires (or interview designs) listed in the table, with slight adaptations.
Frog, Where Are You? (T. 3) is a book with 24 pictures that combine to form a visual narrative. This section was structured as a questionnaire, ensuring that the researcher was minimally involved. We also asked our trained local assistants to record themselves or their relatives and friends answering this questionnaire; therefore, the data that we collected here resembled real-life language use.
Our trained local assistants organized the family talks (T. 4) in our absence. The aim was to record spontaneous speech, so the topics were irrelevant. Since the same assistants prepared the transcripts, they could omit any sections that contained potentially harmful information or could have been used to identify the speakers.
Collecting this type of data was most successful for Golloborda Macedonian speakers since we had a network of trained local assistants upon whom we could rely.
We managed to arrange a few family talks among Prespa Macedonian speakers and just one family talk with Shijak Štokavian speakers.
For Myzeqe Štokavian, arranging family talks has not yet been successful.
For Korça Macedonian, such discussions were impossible since none of our speakers still used the dialect daily, although they could still speak it with us. The epigraph above was spoken by one of the speakers from Drenova (Dre01), wherein he described the frustration he felt while witnessing the attrition and loss of his native dialect.
The speakers in the transcripts were divided into the three main categories:
1) Native speakers of the respective SDAs. They were anonymized. All information that could be used for their identification was manually removed from the corpus (tagged as ((ERASED))
). We also changed their voices in order to make them unrecognizable. All speakers of this category were referenced with indices comprising three letters (for the settlement) and two digits. We also referred to them by these indices in our publications based on the corpus.
2) Researchers. They were only referenced with letter indices. Their names are provided in the Acknowledgments section.
SPK. This abbreviation was used to refer to all other speakers whose speech was transcribed for context but was not annotated for various reasons (an unknown neighbor passing by the window and saying hello, an Albanian-speaking waiter in a village café, some unidentified background voices, etc.).
Index | Gender | Status | Year of birth | Origin | Residence | Dialect | Education | Other languages spoken | Family members | Occupation | Comment | Classes graduated |
---|---|---|---|---|---|---|---|---|---|---|---|---|
AC | f | researcher | 1990 | Shahty | Moscow | Ph.D. | Macedonian, Albanian | researcher | ||||
AL | m | researcher | 1988 | Moscow | Moscow | Ph.D. | Bulgarian, Turkish, English | researcher | ||||
AV | f | researcher | 1986 | Hyvinkää | Oulu | MA | Croatian, English | researcher | ||||
Bob01 | f | native | 1936 | Boboshtica | Boboshtica | Boboshtica | professional | Albanian | teacher of Albanian | |||
Bob02 | m | native | 1925 | Boboshtica | Boboshtica | Boboshtica | higher | Albanian | engineer | higher education in Charles University in Prague | ||
Bob03 | m | native | 1930 | Boboshtica | Boboshtica | Boboshtica | professional | Albanian | teacher | |||
Bor01 | f | native | 1999 | Borake | Borake | Borake | higher (unfinished) | Albanian | bride of Bor02 | student | ||
Bor02 | f | native | 1990 | Borake | Borake | Borake | middle | Albanian | brother of Bor11, son of Bor12 | port worker | ||
Bor04 | m | native | 1935 | Borake | Borake | Borake | middle school (unfinished) | Albanian | port worker | Classes graduated: 3 grades elementary, 1 grade school for tractors | 4 grades | |
Bor06 | f | native | 1926 | Borake | Borake | Borake | middle school (unfinished) | Albanian | grandmother of Bor19 | farmer | 3 grades | |
Bor07 | f | native | 1935 | Borake | Borake | Borake | middle school (unfinished) | Albanian | father's sister-in-law of Bor19 | farmer | 3 grades | |
Bor08 | m | native | 1984 | Borake | Zagreb | Croatian | higher | Albanian | IT | |||
Bor10 | m | native | 1947 | Borake | Borake | Borake | professional | Albanian | father of Bor09, Bor19 and Bor23, brother of Bor12 | teacher of maths and Bosnian language at school | ||
Bor11 | m | native | 1979 | Borake | Borake | Borake | middle | Albanian | brother of Bor02, son of Bor12 | worker | ||
Bor12 | m | native | 1950 | Borake | Borake | Borake | middle | Albanian | father of Bor11 and Bor02, husband of Bor25 | worker | ||
Bor15 | m | native | 1957 | Borake | Sukth | Borake | middle | Albanian | father of Bor24 | musician | ||
Bor16 | m | native | 1949 | Borake | Sukth | Borake | middle | Albanian | father of Bor18, husband of Bor17 | worker, builder | mother born in Potkosa, Domanovići | |
Bor17 | f | native | 1949 | Borake | Sukth | Borake | middle | Albanian | mother of Bor18, wife of Bor16 | farmer | ||
Bor18 | m | native | 1969 | Borake | Sukth | Borake | middle | Albanian | son of Bor16 and Bor17 | port worker | ||
Bor19 | f | native | 1987 | Borake | Zagreb | Borake | higher | Albanian | teacher of English, nurse | |||
Bor20 | m | native | 1920 | Borake | Borake | Borake | middle school (unfinished) | Albanian | farmer | 3 grades | ||
Bor21 | m | native | 1945 | Borake | Borake | Borake | middle school (unfinished) | Albanian | son of Bor22 | farmer | 7 grades | |
Bor22 | m | native | 1920 | Borake | Borake | Borake | middle school (unfinished) | Albanian | father of Bor21 | farmer | 3 grades | |
Bor23 | f | native | 1977 | Borake | Borake | Borake | higher | Albanian | teacher of biology and chemistry at school | |||
Bor24 | m | native | 1983 | Borake | Borake | Borake | middle | Albanian | son of Bor15 | musician | ||
Bor25 | f | native | 1950 | Borake | Borake | Borake | middle school (unfinished) | Albanian | wife of Bor12 | farmer | 3 grades | |
Bor26 | f | native | 1950 | Borake | Borake | Borake | middle school (unfinished) | Albanian | daughter of Bor04 | farmer | 3 grades | |
Bor27 | f | native | 1940 | Sukth | Borake | Borake | middle | Albanian | wife of Bor04 | farmer | ||
Bor28 | m | native | 1947 | Borake | Sukth | Borake | middle | Albanian | car master, hotel owner | |||
Bor29 | m | native | 1980 | Borake | Sukth | Borake | middle | Albanian | son of Bor28 | waiter | ||
Bor30 | f | native | 1950 | Sukth | Sukth | Borake | middle | Albanian | wife of Bor28? | cook | she is Albanian, but has learnt the Borake dialect "in three months" after she got married | |
Bor32 | f | native | 1938 | Borake | Borake | Borake | middle school (unfinished) | Albanian | farmer | 7 grades | ||
Bor33 | f | native | 1972 | Borake | Durres | Borake | middle school (unfinished) | Albanian | trader | 8 grades | ||
Dre01 | m | native | 1934 | Drenova | Korça | Boboshtica | higher | Albanian | brother of Dre02 and Dre03 | teacher of maths | ||
Dre02 | f | native | 1927 | Drenova | Korça | Boboshtica | middle | Albanian | sister of Dre01 and Dre03 | housewife | ||
Dre03 | m | native | 1922 | Drenova | Korça | Boboshtica | middle | Albanian | brother of Dre01 and Dre02 | farmer | ||
EU | f | researcher | 1968 | Moscow | Moscow | Ph.D. | Bulgarian | researcher | ||||
Elb01 | m | native | 1995 | Elbasan | Elbasan | Golloborda | middle | Albanian | family from Stebleva | |||
Elb02 | m | native | 1979 | Cerrik | Cerrik | Albanian | middle | Greek | husband of Tre30, father of Tre31, Tre32, Tre29 | builder | ||
Erb01 | f | native | 1975 | Erbele | Tirana | Golloborda | higher | Albanian | housewife | |||
Tre80 | m | native | Elbasan | Albanian | Only speaks Albanian | |||||||
Gji01 | f | native | 1970 | Gjinovec | Elbasan | Golloborda | middle | Albanian | ||||
Gll01 | m | native | 1943 | Glloboçani | Glloboçani | Prespa | middle | Albanian | farmer, worker | |||
Gll02 | m | native | 1958 | Glloboçani | Glloboçani | Prespa | professional | Albanian | administrator | |||
Gor01 | f | native | 1946 | Shulin | Gorna Gorica | Prespa | middle | Albanian | farmer | |||
Gor02 | m | native | 1952 | Gorna Gorica | Gorna Gorica | Prespa | middle | Albanian | farmer | |||
Gor03 | m | native | 1995 | Gorna Gorica | Gorna Gorica | Prespa | middle | Albanian | beekeeper | |||
Gor05 | f | native | 1955 | Gorna Gorica | Elbasan | Prespa | higher | Albanian | nurse | left Prespa in 1973 | ||
Gor06 | m | native | 1970 | Gorna Gorica | Gorna Gorica | Prespa | middle | Albanian | farmer | |||
Gor07 | m | native | 1948 | Gorna Gorica | Gorna Gorica | Prespa | middle school (unfinished) | Albanian | church administrator | 7 grades | ||
Gor08 | m | native | 1970 | Gorna Gorica | Gorna Gorica | Prespa | middle | Albanian | political administrator | |||
Gor09 | m | native | 1960 | Gorna Gorica | Gorna Gorica | Prespa | middle | Albanian | businessman | |||
Gor10 | f | native | 1952 | Gorna Gorica | Gorna Gorica | Prespa | middle | Albanian | ||||
Gor11 | m | native | 1966 | Gorna Gorica | Gorna Gorica | Prespa | middle | Albanian | beekeeper | |||
Gor12 | f | native | 1966 | Gorna Gorica | Gorna Gorica | Prespa | middle | Albanian | father of Gor03, husband of Gor12 | beekeeper | ||
Gor13 | m | native | 1935 | Gorna Gorica | Gorna Gorica | Prespa | middle school (unfinished) | Albanian | farmer | |||
Gor14 | m | native | 1965 | Gorna Gorica | Korça | Prespa | middle school (unfinished) | Albanian | 8 grades | |||
Ham01 | m | native | 1970 | Hamil | Fier | Fier | higher | Albanian | cousin of Rre04 | lawyer | mother from Bjelo Pole | |
Ham02 | m | native | 1965 | Hamil | Hamil | Fier | middle | Albanian | son of Pet01 | businessman | ||
Kle01 | f | native | 1982 | Klenje | Elbasan | Golloborda | middle | Albanian | ||||
Kle02 | m | native | 1977 | Klenje | Tirana | Golloborda | higher | Albanian | moved to Tirana in 1997 | |||
Kle04 | m | native | 1983 | Klenje | Elbasan | Golloborda | middle | Albanian | hoxha | moved to Elbasan in 2004 | ||
Kor01 | m | native | 1972 | Korça | Korça | Prespa (Tuminec) | higher | Albanian | teacher | lives in Korca from 1990 | ||
Kor02 | m | native | 1969 | Korça | Pustec | Prespa | higher | Albanian | teacher | |||
Kor03 | m | native | 1969 | Korça | Pustec | Prespa | higher | Albanian | teacher | |||
Lesh02 | m | native | 1957 | Leshnicani | Elbasan | Golloborda | professional | Albanian | worker | moved to Elbasan in 1963 | ||
Lesh03 | m | native | 1932 | Leshnicani | Elbasan | Golloborda | middle | Albanian | farmer | left Trebisht in 1944, moved to Durrës | ||
Lesh04 | m | native | 1963 | Leshnicani | Elbasan | Golloborda | middle | Albanian | worker | |||
Lesh05 | m | native | 1967 | Leshnichani | Tirana | Golloborda | middle | Albanian | businessman, worker | |||
Lesh06 | m | native | 1969 | Leshnichani | Tirana | Golloborda | middle | Albanian | worker | |||
Lesh07 | f | native | 1967 | Leshnichani | Fushë Kruja | Golloborda | middle | Albanian | daughter of Lesh05 | |||
Lesh08 | m | native | 1952 | Elbasan | Elbasan | Golloborda | middle | Albanian | son of Lesh03 | |||
Lesh09 | f | native | 1952 | Elbasan | Elbasan | Albanian | middle | Albanian | wife of Lesh09 | Albanian speaker, origin from Trebishta | ||
Lesh10 | f | native | 1972 | Elbasan | Elbasan | Albanian | middle | Albanian | daughter of Lesh09 and Lesh10 | |||
Lla01 | m | native | 1948 | Lladimerica | Elbasan | Golloborda | middle | Albanian | ||||
Lla02 | f | native | 1938 | Lladimerica | Tirana | Golloborda | middle school (unfinished) | Albanian | 4 grades | |||
MC | m | researcher | 1991 | Moscow | Moscow | Ph.D. | Macedonian, English | researcher | ||||
MM | m | researcher | 1984 | Moscow | Oldenburg | Ph.D. | Albanian, Macedonian, BCMS, English | researcher | ||||
MMI | f | researcher | Sofia | Calgary | higher | Albanian, English | researcher | |||||
Mal02 | m | native | 1955 | Malestreni | Tirana | Golloborda | middle | Albanian | military | |||
Mal04 | m | native | 1984 | Malestreni | Tirana | Golloborda | higher | Albanian | researcher | |||
NK | f | researcher | 1992 | Železnogorsk | Moscow | Ph.D. | Macedonian, BCMS | researcher | ||||
NM | f | researcher | 1995 | Altdorf | Zürich | high school | English, Macedonian | researcher | ||||
Non01 | all non-transcribed speakers | |||||||||||
Ost01 | f | native | 1975 | Ostren | Elbasan | Golloborda | middle | Albanian | ||||
Ost02 | f | native | 1974 | Ostreni | Elbasan | Golloborda | middle | Albanian | ||||
Pet01 | f | native | 1940 | Petovë | Hamil | Fier | middle school (unfinished) | Albanian | aunt of Rre04 | farmer | 4 grades | |
Pus01 | f | native | 1995 | Pustec | Pustec | Prespa | higher | Albanian | vnu'če od zo'lva of Pus09, daughter-in-law of Pus23 | teacher of sports at school | ||
Pus03 | f | native | 1969 | Pustec | Korça | Prespa | middle | Albanian | trader | |||
Pus04 | m | native | 1953 | Pustec | Pustec | Prespa | middle | Albanian | political administrator, journalist, writer | |||
Pus05 | f | native | 1965 | Pustec | Pustec | Prespa | middle | Albanian | farmer | |||
Pus06 | m | native | 1983 | Pustec | Pustec | Prespa | higher | Albanian | political administrator, historian, higher education in Skopje | |||
Pus08 | f | native | 1960 | Pustec | Pustec | Prespa | middle | Albanian | farmer | |||
Pus09 | f | native | 1960 | Pustec | Pustec | Prespa | middle | Albanian | posestrima of Pus24 | farmer | ||
Pus11 | m | native | 1960 | Pustec | Pustec | Prespa | middle | Albanian | farmer | |||
Pus12 | m | native | 1970 | Pustec | Pustec | Prespa | middle | Albanian | teacher | |||
Pus13 | f | native | 1960 | Pustec | Pustec | Prespa | middle | Albanian | ||||
Pus14 | m | native | 1935 | Pustec | Pustec | Prespa | middle | Albanian | ||||
Pus15 | m | native | 1960 | Pustec | Pustec | Prespa | middle | Albanian | farmer | |||
Pus16 | m | native | 1978 | Pustec | Korça | Prespa | higher | Albanian | teacher of MK | graduated in Skopje | ||
Pus17 | f | native | 1990 | Korça | Pustec | Prespa | middle | Albanian | wife of Pus18 | farmer | ||
Pus18 | m | native | 1990 | Pustec | Pustec | Prespa | middle | Albanian | husband of Pus17 | |||
Pus19 | m | native | 1970 | Pustec | Pustec | Prespa | middle | Albanian | father of Pus18? | farmer | ||
Pus20 | f | native | 1950 | Pustec | Pustec | Prespa | middle | Albanian | farmer | |||
Pus21 | f | native | 1950 | Pustec | Pustec | Prespa | middle | Albanian | farmer | |||
Pus23 | f | native | 1943 | Gorna Gorica | Pustec | Prespa | middle school (unfinished) | Albanian | mother-in-law of Pus01 | farmer | 3 grades | |
Pus24 | f | native | 1973 | Pustec | Pustec | Prespa | middle | Albanian | posestrima of Pus09 | |||
Rre01 | m | native | 1968 | Rreth Libofsha | Rreth Libofsha | Fier | middle | Albanian | son of Rre02 and Rre03 | businessman | ||
Rre02 | m | native | 1942 | Rreth Libofsha | Rreth Libofsha | Fier | middle | Albanian | husband of Rre04, father of Rre01 | farmer | father from Bujca | |
Rre03 | f | native | 1945 | Rreth Libofsha | Rreth Libofsha | Fier | middle | Albanian | wife of Rre02, mother of Rre01 | farmer | mother from Jaˋblanica, father from Donane | |
Rre04 | m | native | 1957 | Rreth Libofsha | Rreth Libofsha | Fier | middle | Albanian | general worker | |||
Rre06 | m | native | 1951 | Rreth Libofsha | Fier | Fier | middle | Albanian | businessman | |||
Pus22 | m | native | 1980 | Pustec | Pustec | Prespa | higher | Albanian | political administrator | |||
Ste01 | m | native | 1975 | Stebleva | Elbasan | Golloborda | middle | Albanian | ||||
Ste02 | f | native | 1942 | Stebleva | Elbasan | Golloborda | middle | Albanian | ||||
Ste03 | m | native | 1971 | Stebleva | Elbasan | Golloborda | middle | Albanian | ||||
Ste04 | m | native | 1968 | Stebleva | Elbasan | Golloborda | middle | Albanian | ||||
Ste05 | f | native | 1972 | Stebleva | Elbasan | Golloborda | middle | Albanian | ||||
Ste07 | m | native | 1962 | Stebleva | Elbasan | Golloborda | middle | Albanian | ||||
TG | f | researcher | 1983 | Magadan | Moscow | Ph.D. | Macedonian, BCMS | researcher | ||||
Tir01 | m | native | 1960 | Tirana | Tirana | Golloborda | middle | Albanian | trader | |||
Tir02 | f | native | 2009 | Fushë Kruja | Fushë Kruja | Golloborda | none | Albanian | daughter of Lesh07 | preschool student | ||
Tre01 | f | native | 1946 | Trebisht | Trebisht | Golloborda | middle | Albanian | mother of Tre20, Tre27, Tre30 | farmer | ||
Tre02 | m | native | 1964 | Bulqiza | Tirana | Golloborda | professional | Albanian | military | family from Trebishta, born in Bulqiza | ||
Tre03 | f | native | 1956 | Trebisht | Trebisht | Golloborda | middle | Albanian | mother of Tre19 | farmer | ||
Tre04 | m | native | 1936 | Trebisht | Trebisht | Golloborda | middle | Albanian | farmer | |||
Tre05 | f | native | 1936 | Trebisht | Trebisht | Golloborda | middle | Albanian | farmer | |||
Tre07 | m | native | 1994 | Trebisht | Trebisht | Albanian | middle | Albanian | farmer | |||
Tre10 | m | native | 2016 | Trebisht | Trebisht | Golloborda | middle | Albanian | son of Tre11 | farmer | ||
Tre11 | f | native | 1991 | Trebisht | Trebisht | Golloborda | middle | Albanian | daughter of Tre17 | farmer | has been living in Croatia for the last 2-3 years | |
Tre12 | f | native | 1944 | Trebisht | Trebisht | Golloborda | middle | Albanian | farmer | |||
Tre13 | f | native | 2003 | Trebisht | Trebisht | Golloborda | middle | Albanian | farmer | |||
Tre14 | m | native | 1968 | Trebisht | Trebisht | Golloborda | middle | Albanian | farmer | |||
Tre15 | f | native | 1970 | Trebisht | Elbasan | Golloborda | middle | Albanian | housewife | |||
Tre17 | f | native | 1963 | Trebisht | Trebisht | Golloborda | middle | Albanian | farmer | |||
Tre18 | f | native | 2005 | Trebisht | Trebisht | Golloborda | middle | Albanian | school student | |||
Tre19 | m | native | 1977 | Trebisht | Trebisht | Golloborda | none | Albanian | builder | |||
Tre20 | f | native | 1977 | Trebisht | Trebisht | Golloborda | middle | Albanian | sister of Tre27, Tre30, daughter of Tre01 | housewife | ||
Tre21 | m | native | 2006 | Trebisht | Trebisht | Golloborda | middle school student | Albanian | school student | |||
Tre22 | f | native | 2003 | Trebisht | Trebisht | Golloborda | middle school student | Albanian | school student | |||
Tre24 | f | native | 2003 | Trebisht | Trebisht | Golloborda | middle school student | Albanian | school student | |||
Tre25 | m | native | 1996 | Trebisht | Trebisht | Golloborda | higher (unfinished) | Albanian | student | |||
Tre27 | m | native | 1970 | Trebisht | Trebisht | Golloborda | middle | Albanian, Greek, BCMS | brother of Tre20, Tre30, son of Tre01 | general worker | ||
Tre29 | m | native | 2006 | Trebisht | Trebisht | Golloborda | middle school student | Albanian | son of Tre30 and Elb02, brother of Tre31 and Tre32 | school student | ||
Tre30 | f | native | 1980 | Trebisht | Cerrik | Golloborda | middle | Albanian | sister of Tre27, Tre20, daughter of Tre01, wife of Elb02, mother of Tre31, Tre32, Tre29 | housewife | ||
Tre31 | f | native | 2002 | Trebisht | Elbasan | Golloborda | higher (unfinished) | Albanian | daughter of Tre30 and Elb02, sister of Tre32 and Tre29 | school student | going to school in Ohrid | |
Tre32 | f | native | 2002 | Trebisht | Trebisht | Golloborda | higher (unfinished) | Albanian | daughter of Tre30 and Elb02, sister of Tre31 and Tre29 | school student | going to school in Ohrid | |
Tre33 | m | native | 1956 | Trebisht | Elbasan | Golloborda | middle | Albanian | ||||
Tre34 | f | native | 1959 | Trebisht | Elbasan | Golloborda | middle school (unfinished) | Albanian | 4 grades | |||
Tre35 | m | native | 1947 | Trebisht | Elbasan | Golloborda | middle | Albanian | ||||
Tre37 | f | native | 1948 | Trebisht | Elbasan | Golloborda | middle | Albanian | ||||
Tre38 | m | native | 1984 | Trebisht | Trebisht | Golloborda | middle | Albanian | works in Croatia and Germany | |||
Tre39 | m | native | 1955 | Tirana | Tirana | Golloborda | middle | Albanian | father from Vërnica, mother from Malestreni. Family moved to Tirana in 1954 | |||
Tre40 | f | native | 1957 | Përmet | Tirana | Golloborda | middle | Albanian | parents are from Përmet | |||
Tre41 | m | native | 1930 | Trebisht | Tirana | Golloborda | higher | Albanian | ||||
Tre42 | m | native | 1960 | Trebisht | Tirana | Golloborda | middle | Albanian | ||||
Tre43 | m | native | 1991 | Tirana | Tirana | Golloborda | middle | Albanian | Born and raised in Tirana, now lives in Tirana, speaks AL at home | |||
Tre44 | m | native | 1996 | Tirana | Tirana | Golloborda | middle | Albanian | Born and raised in Tirana, now lives in Tirana, speaks AL at home | |||
Tre45 | m | native | 1966 | Trebisht | Elbasan | Golloborda | middle | Albanian | Born and raised in Trebisht, now lives in El, speaks AL at home | |||
Tre46 | m | native | 1990 | Trebisht | Trebisht | Golloborda | middle | Albanian | ||||
Tre47 | m | native | Trebisht | Trebisht | Golloborda | Albanian | ||||||
Tre48 | f | native | 1989 | Trebisht | Trebisht | Golloborda | middle | Albanian | ||||
Tre49 | f | native | 2011 | Trebisht | Trebisht | Golloborda | middle school student | Albanian | ||||
Tre50 | f | native | 2014 | Trebisht | Trebisht | Golloborda | middle school student | Albanian | ||||
Tre51 | m | native | 2006 | Trebisht | Fushë Kruja | Golloborda | middle school student | Albanian | ||||
Tre52 | m | native | 1967 | Cerrik | Cerrik | Golloborda | middle | Albanian | husband of Tre53 | |||
Tre53 | f | native | 1967 | Trebisht | Cerrik | Golloborda | middle | Albanian | wife of Tre52 | moved in 1993 to Cerrik because of her marriage to Tre52 | ||
Tre54 | f | native | 2001 | Cerrik | Cerrik | Golloborda | higher | Albanian | ||||
Tre55 | f | native | 1949 | Elbasan | Trebisht | Golloborda | middle school (unfinished) | Albanian | 7 grades | |||
Tre56 | f | native | Trebisht | Tirana | Golloborda | Albanian | ||||||
Tre57 | m | native | 2015 | Trebisht | Tirana | Golloborda | middle school student | Albanian | ||||
Tre59 | f | native | 1999 | Tirana | Tirana | Golloborda | student | Albanian | ||||
Tre60 | m | native | 1949 | Trebishta | Tirana | Golloborda | middle school (unfinished) | Albanian | 7 grades | |||
Tre61 | f | native | 1952 | Trebishta | Tirana | Golloborda | middle school (unfinished) | Albanian | 7 grades | |||
Tre62 | f | native | 1971 | Trebishta | Elbasan | Golloborda | middle | Albanian | ||||
Tre63 | m | native | 1938 | Trebishta | Elbasan | Golloborda | middle | Albanian | ||||
Tre64 | m | native | Trebishta | Elbasan | Golloborda | middle | Albanian | |||||
Tre65 | m | native | 1940 | Trebishta | Elbasan | Golloborda | middle school (unfinished) | Albanian | ||||
Tre66 | m | native | Trebishta | Elbasan | Golloborda | Albanian | ||||||
Tre67 | m | native | 2007 | Trebishta | Tirana | Golloborda | middle school student | Albanian | ||||
Tre68 | m | native | 2008 | Trebishta | Tirana | Golloborda | middle school student | Albanian | ||||
Tre69 | f | native | 1964 | Elbasan | Elbasan | Golloborda | middle | Albanian | ||||
Tre71 | f | native | Trebisht | Tirana | Golloborda | Albanian | ||||||
Tre72 | m | native | 2001 | Trebishta | Trebishta | Golloborda | higher | Albanian | student in Skopje | |||
Tre73 | f | native | 1972 | Trebishta | Trebishta | Golloborda | middle | Albanian | ||||
Tre75 | m | native | 1964 | Trebishta | Trebishta | Golloborda | middle | Albanian | ||||
Tre76 | m | native | 1992 | Elbasan | Trebishta | Golloborda | middle | Albanian | moved to Trebishta in 2003 | |||
Tre77 | m | native | 1996 | Trebishta | Trebishta | Golloborda | middle | Albanian | graduated in Prilep | |||
Tum01 | m | native | 1956 | Tuminec | Tuminec | Prespa (Tuminec) | middle | Albanian | ||||
Tum02 | f | native | 2015 | Tuminec | Tuminec | Prespa (Tuminec) | preschool | |||||
Tum03 | f | native | 1994 | Tuminec | Pustec | Prespa (Tuminec) | higher | Albanian | ||||
Tum04 | f | native | 1952 | Tuminec | Tuminec | Prespa (Tuminec) | middle school (unfinished) | Albanian | 2 grades | |||
Tum05 | f | native | 1977 | Tuminec | Tuminec | Prespa (Tuminec) | middle school (unfinished) | Albanian | 8 grades | |||
Tum06 | m | native | 1939 | Tuminec | Tuminec | Prespa (Tuminec) | middle school (unfinished) | Albanian | brother of Tum07, husband of Tum08 | musician | ||
Tum07 | f | native | 1954 | Tuminec | Tuminec | Prespa (Tuminec) | middle school (unfinished) | Albanian | wife of Tum06 | |||
Tum08 | f | native | 1940 | Tuminec | Tuminec | Prespa (Tuminec) | middle school (unfinished) | Albanian | sister of Tum06 | |||
Tum09 | m | native | 1984 | Tuminec | Tuminec | Prespa (Tuminec) | middle school (unfinished) | Albanian | 7 grades | |||
Tum10 | m | native | 1964 | Tuminec | Tuminec | Prespa (Tuminec) | middle school (unfinished) | Albanian | 7 grades | |||
Tum11 | f | native | 1964 | Tuminec | Tuminec | Prespa (Tuminec) | middle school (unfinished) | Albanian | mother of Tum09 | 7 grades | ||
Tum12 | m | native | 1964 | Tuminec | Tuminec | Prespa (Tuminec) | middle school (unfinished) | Albanian | father of Tum09 | 5 grades | ||
Tum13 | f | native | 1964 | Tuminec | Tuminec | Prespa (Tuminec) | middle school (unfinished) | Albanian | 7 grades | |||
Tum14 | m | native | 2000 | Tuminec | Tuminec | Prespa (Tuminec) | middle school (unfinished) | Albanian | 7 grades | |||
Tum15 | m | native | 1940 | Tuminec | Tuminec | Prespa (Tuminec) | middle school (unfinished) | Albanian | cousin of Tum06, father of Tum16 | farmer | 7 grades | |
Tum16 | m | native | 1960 | Tuminec | Tuminec | Prespa (Tuminec) | middle | Albanian | son of Tum15 | |||
Tum17 | m | native | 1955 | Tuminec | Tuminec | Prespa (Tuminec) | middle school (unfinished) | Albanian | 7 grades | |||
Tum18 | f | native | 1931 | Tuminec | Tuminec | Prespa (Tuminec) | middle school (unfinished) | Albanian | 3 grades | |||
Tum19 | m | native | 1999 | Elbasan | Korça | Prespa (Tuminec) | middle | Albanian | ||||
Tum20 | m | native | 1947 | Tuminec | Korça | Prespa (Tuminec) | middle school (unfinished) | Albanian | ||||
Tum21 | f | native | 1952 | Tuminec | Korça | Prespa (Tuminec) | middle school (unfinished) | Albanian | ||||
Tum22 | f | native | 2010 | Elbasan | Korça | Prespa (Tuminec) | middle school student | Albanian | ||||
Tum23 | f | native | 1970 | Tuminec | Tuminec | Prespa (Tuminec) | middle | Albanian | vujna of Tum19, Kor01 | farmer | ||
Vor02 | m | native | 1971 | Tirana | Tirana | Golloborda | higher | Albanian | priest | origin from Vërnica |
Our team manually prepared all transcripts. When possible, our trained local assistants, speakers of the dialects who also organized the family discussions, prepared the draft versions. Our editors (specialists in their respective philology) proofread these draft versions, following which Dr. Maxim Makartsev double-proofread them. If trained local assistants were unavailable, our editors prepared the transcripts. We used EXMARaLDA Partitur Editor to match the transcripts with the recordings. The original scripts for Albanian and other languages were used. Only transcripts in Slavic dialects were proofread, while transcripts for Albanian and other languages were not for contextual purposes. (Hence, the spelling ranges from standard to non-orthographical semi-phonetic spelling.)
Our transcription system was as follows:
i | ü (Alb yll, ky) | u |
e | ă (schwa) | o |
a |
m | |||
l | r | n | |
j | lj | nj |
We did not distinguish between the flap and the drill r (cf. Albanian [r] as in arrë and [ɾ] as in erё) in Slavic.
p | b | f | v |
t | d | θ | ð |
c | dz | s | z |
č | dž | š | ž |
ć | đ | ś | ź |
kʼ (Macedonian ќ) | gʼ (Macedonian ѓ) | ||
k | g | h |
Stress was indicated by an apostrophe ('
following the accentuated vowel), also in one-syllable words. This was necessary to show the position of the stress in phonetic words that included clitics, as well as other unstressed words (prepositions, etc.). Pauses were indicated only within narratives (i.e., if the speaker listed words or spoke separate sentences and fragments, so pauses were not necessary). A pause was indicated as follows: ((0.5))
for a 500 ms pause.
For several Štokavian transcripts (e.g., 33 (1), 33 (2), 33 (3)), we also tried to indicate the accent type, which corresponded to the conventions for the respective standards, as follows (symbols accepted in the BCMS national linguistic traditions are provided before the equal sign, with our symbols thereafter):
long | short | |
rising tone | é = e': | è = e' |
falling tone | ȇ = eˋ: | ȅ = eˋ |
The automatic processing of the Štokavian data fully relied on the presented system being close to Gajica. For the automatic processing of Macedonian, the script was automatically recoded into standard Macedonian Cyrillic, following the procedure for Classla 2.1.1. The only place Cyrillic was used in the user interface was in the Macedonian lemmatization.
We followed several steps for the annotation.
First, we formulated the rules to define the language of the respective word form based on the tags provided in EXMARaLDA Partitur Manager manually by the transcribers and editors and on language-specific scripts (e.g., Greek, Russian), symbols (e.g., ë, special for Albanian), and combinations of symbols (e.g., ll, rr, initial ng and mb, special or unique for Albanian).
Second, the respective parsers and taggers were applied depending on the language of the word form (see the table above). Following this, only those parts of the transcripts that the speakers uttered (not the researchers) in Slavic dialects were manually and semi-automatically checked, proofread, and edited. The parts of the transcripts that the researchers uttered or that speakers uttered in other language varieties were not proofread. In such cases, we kept the automatic annotation.
Third, lemmatization was manually checked (for Slavic); the lemmas automatically marked as Albanian were selectively checked and corrected, if needed. For Golloborda, Korça, and Prespa Macedonian, the lemmatization was performed in standard Macedonian—as the closest standard language structurally. For Myzeqe and Shijak, lemmatization was based on Ijekavian standards. For dialectal lexemes that did not exist in the respective standards, standard phonology was applied, resulting in the creation of dummy lemmata that followed standard phonology but cannot be found in standard dictionaries. The only function of these lemmata was to allow for the trans-dialectal search of word forms. If a standard cognate could not be established, we adopted any suitable word form attested in our transcripts.
Fourth, the morphological tags for Macedonian and Štokavian were harmonized, since Classla 2.1.1 uses slightly different MULTEXT-East conventions for varieties. The resulting tag set is provided below.
Fifth, the results of the morphological tagging were selectively checked. We focused on the word forms with the greatest homonymy and the lexemes with the most frequent tokens.
This is the beta-version of our corpus, so the manual editing of the morphological tags is ongoing. If you notice an error, please feel free to contact us. When working with the corpus, manual checking of the search results is highly recommended.
Considering the principally bilingual nature of our data and frequent code-switches, we would like to highlight two instruments here:
1) The corpus allows for searching sets of word forms that are specifically ordered (one after another or with one or more irrelevant word forms in between). You may compose the search entry by marking one of the word forms as Slavic (you can choose the dialect) and another as Albanian, which will show all cases of code-switches that follow your chosen parameters.
2) The special field Foreign
includes all lexical matter borrowings and congruent lexicalizations from Albanian. (They cannot be formally distinguished since both types have Albanian stems and Slavic inflectional morphology.)
We also distinguished direct speech (tag OWN
) and quotations (XENO
), while several other tags provide additional information about the intonation and context of the interview (((LAUGH))
, ((COUGH))
, ((NOISE))
).
Transcript_ID | Dialectal type | Dialect | Place of recording | Genre | Researcher attending | Settlement type | Fully accentuated |
---|---|---|---|---|---|---|---|
2 | Štokavian | Shijak | Borake | Interview | yes | Village | no |
10 | Balkan Slavic | Korça | Boboshtica | Interview | yes | Village | no |
12 | Balkan Slavic | Korça | Boboshtica | Interview | yes | Village | no |
13 | Balkan Slavic | Korça | Boboshtica | Frog | yes | Village | no |
14 | Balkan Slavic | Korça | Boboshtica | Frog | yes | Village | no |
15 | Balkan Slavic | Korça | Boboshtica | Frog | yes | Village | no |
16 | Balkan Slavic | Golloborda | Tirana | Interview | yes | City | no |
17 | Balkan Slavic | Golloborda | Tirana | Interview | yes | City | no |
18 | Balkan Slavic | Golloborda | Tirana | Interview | yes | City | no |
19 | Balkan Slavic | Golloborda | Tirana | Interview | yes | City | no |
20 | Balkan Slavic | Golloborda | Elbasan | Interview | yes | City | no |
22 | Balkan Slavic | Golloborda | Elbasan | Frog | yes | City | no |
24 | Balkan Slavic | Prespa | Pustec | Interview | yes | Village | no |
25 | Balkan Slavic | Prespa | Korça | Interview | yes | City | no |
26 | Balkan Slavic | Golloborda | Elbasan | Interview | yes | City | no |
27 | Balkan Slavic | Korça | Korça | Interview | yes | City | no |
28 | Balkan Slavic | Korça | Korça | Interview | yes | City | no |
30 | Balkan Slavic | Korça | Korça | Frog | yes | City | no |
31 | Balkan Slavic | Korça | Korça | Frog | yes | City | no |
32 | Balkan Slavic | Korça | Korça | Frog | yes | City | no |
35 | Štokavian | Shijak | Borake | Frog | yes | Village | no |
36 | Balkan Slavic | Golloborda | Elbasan | Frog | yes | City | no |
37 | Balkan Slavic | Prespa | Elbasan | Interview | yes | City | no |
38 | Balkan Slavic | Golloborda | Tirana | Frog | yes | City | no |
39 | Balkan Slavic | Golloborda | Tirana | Interview | yes | City | no |
40 | Balkan Slavic | Golloborda | Elbasan | Interview | yes | City | no |
41 | Balkan Slavic | Golloborda | Elbasan | Interview | yes | City | no |
42 | Balkan Slavic | Golloborda | Tirana | Interview | yes | City | no |
44 | Balkan Slavic | Golloborda | Elbasan | Frog | yes | City | no |
45 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
48 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
49 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
50 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
51 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
52 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
54 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
55 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
56 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
57 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
58 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
59 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
60 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
61 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
62 | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
63 | Balkan Slavic | Prespa | Pustec | Family | no | Village | no |
65 | Balkan Slavic | Prespa | Tuminec | Family | no | Village | no |
67 | Balkan Slavic | Prespa | Tuminec | Family | no | Village | no |
70 | Štokavian | Shijak | Borake | Interview | yes | Village | yes |
72 | Štokavian | Shijak | Borake | Interview | yes | Village | yes |
73 | Balkan Slavic | Prespa | Korça | Frog | yes | City | no |
74 | Balkan Slavic | Prespa | Korça | Interview | yes | City | no |
75 | Balkan Slavic | Prespa | Korça | Frog | yes | City | no |
76 | Štokavian | Shijak | Borake | Interview | yes | Village | no |
77 | Štokavian | Shijak | Borake | Interview | yes | Village | no |
78 | Balkan Slavic | Prespa | Elbasan | Frog | yes | City | no |
79 | Balkan Slavic | Prespa | Korça | Frog | yes | City | no |
80 | Balkan Slavic | Prespa | Korça | Interview | yes | City | no |
81 | Balkan Slavic | Prespa | Korça | Interview | yes | City | no |
82 | Balkan Slavic | Prespa | Pustec | Frog | yes | Village | no |
83 | Balkan Slavic | Prespa | Globoçeni | Frog | yes | Village | no |
84 | Balkan Slavic | Prespa | Globoçeni | Interview | yes | Village | no |
86 | Balkan Slavic | Prespa | Pustec | Interview | yes | Village | no |
90 | Balkan Slavic | Prespa | Globoçeni | Interview | yes | Village | no |
91 | Balkan Slavic | Prespa | Dolna Gorica | Interview | yes | Village | no |
94 | Balkan Slavic | Prespa | Pustec | Interview | yes | Village | no |
96 | Balkan Slavic | Prespa | Korça | Interview | yes | City | no |
98 | Balkan Slavic | Prespa | Korça | Interview | yes | City | no |
99 | Balkan Slavic | Golloborda | Trebisht | Family | no | Village | no |
100 | Balkan Slavic | Golloborda | Trebisht | Family | no | Village | no |
101 | Balkan Slavic | Golloborda | Trebisht | Family | no | Village | no |
104 | Balkan Slavic | Golloborda | Trebisht | Frog | no | Village | no |
105 | Balkan Slavic | Golloborda | Trebisht | Family | no | Village | no |
106 | Balkan Slavic | Golloborda | Trebisht | Family | no | Village | no |
108 | Balkan Slavic | Golloborda | Trebisht | Frog | no | Village | no |
109 | Balkan Slavic | Golloborda | Trebisht | Family | no | Village | no |
110 | Balkan Slavic | Golloborda | Trebisht | Family | no | Village | no |
111 | Balkan Slavic | Golloborda | Trebisht | Family | no | Village | no |
112 | Balkan Slavic | Golloborda | Trebisht | Frog | no | Village | no |
113 | Balkan Slavic | Golloborda | Trebisht | Frog | no | Village | no |
114 | Balkan Slavic | Golloborda | Elbasan | Frog | no | City | no |
116 | Balkan Slavic | Golloborda | Elbasan | Frog | no | City | no |
117 | Balkan Slavic | Golloborda | Elbasan | Family | no | City | no |
118 | Balkan Slavic | Golloborda | Tirana | Frog | no | City | no |
119 | Balkan Slavic | Golloborda | Tirana | Family | no | City | no |
121 | Balkan Slavic | Golloborda | Elbasan | Frog | no | City | no |
122 | Balkan Slavic | Golloborda | Elbasan | Family | no | City | no |
123 | Balkan Slavic | Golloborda | Elbasan | Frog | no | City | no |
124 | Balkan Slavic | Golloborda | Tirana | Frog | no | City | no |
125 | Balkan Slavic | Golloborda | Elbasan | Frog | no | City | no |
126 | Balkan Slavic | Golloborda | Tirana | Family | no | City | no |
127 | Balkan Slavic | Golloborda | Elbasan | Frog | no | City | no |
128 | Balkan Slavic | Golloborda | Elbasan | Family | no | City | no |
130 | Balkan Slavic | Golloborda | Tirana | Frog | no | City | no |
131 | Balkan Slavic | Golloborda | Elbasan | Family | no | City | no |
133 | Balkan Slavic | Golloborda | Tirana | Frog | no | City | no |
134 | Balkan Slavic | Golloborda | Tirana | Frog | no | City | no |
135 | Balkan Slavic | Golloborda | Tirana | Family | no | City | no |
136 | Balkan Slavic | Golloborda | Tirana | Frog | no | City | no |
137 | Balkan Slavic | Golloborda | Tirana | Family | no | City | no |
138 | Balkan Slavic | Golloborda | Tirana | Frog | no | City | no |
139 | Balkan Slavic | Golloborda | Tirana | Family | no | City | no |
140 | Balkan Slavic | Golloborda | Elbasan | Family | no | City | no |
145 | Balkan Slavic | Prespa | Tuminec | Family | no | Village | no |
146 | Štokavian | Myzeqe | Hamil | Interview | yes | Village | no |
150 | Štokavian | Myzeqe | Fier | Interview | yes | City | no |
151 | Štokavian | Myzeqe | Rreth Libofsha | Interview | yes | Village | no |
152 | Štokavian | Myzeqe | Rreth Libofsha | Interview | yes | Village | no |
153 | Štokavian | Myzeqe | Rreth Libofsha | Interview | yes | Village | no |
154 | Štokavian | Myzeqe | Rreth Libofsha | Interview | yes | Village | no |
155 | Štokavian | Myzeqe | Rreth Libofsha | Interview | yes | Village | no |
156 | Štokavian | Myzeqe | Rreth Libofsha | Interview | yes | Village | no |
157 | Štokavian | Myzeqe | Rreth Libofsha | Interview | yes | Village | no |
158 | Štokavian | Myzeqe | Rreth Libofsha | Frog | yes | Village | no |
159 | Štokavian | Myzeqe | Rreth Libofsha | Interview | yes | Village | no |
160 | Štokavian | Myzeqe | Rreth Libofsha | Interview | yes | Village | no |
161 | Štokavian | Shijak | Sukth | Interview | yes | Village | no |
162 | Štokavian | Shijak | Sukth | Interview | yes | Village | no |
163 | Štokavian | Shijak | Sukth | Interview | yes | Village | no |
164 | Štokavian | Shijak | Sukth | Interview | yes | Village | no |
165 | Štokavian | Shijak | Borake | Frog | yes | Village | no |
166 | Štokavian | Shijak | Borake | Interview | yes | Village | no |
168 | Balkan Slavic | Prespa | Dolna Gorica | Interview | yes | Village | no |
169 | Balkan Slavic | Prespa | Dolna Gorica | Interview | yes | Village | no |
170 | Balkan Slavic | Prespa | Dolna Gorica | Interview | yes | Village | no |
171 | Balkan Slavic | Prespa | Dolna Gorica | Interview | yes | Village | no |
172 | Balkan Slavic | Prespa | Dolna Gorica | Interview | yes | Village | no |
173 | Balkan Slavic | Prespa | Dolna Gorica | Frog | yes | Village | no |
174 | Balkan Slavic | Prespa | Dolna Gorica | Interview | yes | Village | no |
175 | Balkan Slavic | Prespa | Dolna Gorica | Interview | yes | Village | no |
176 | Balkan Slavic | Prespa | Dolna Gorica | Frog | yes | Village | no |
177 | Balkan Slavic | Prespa | Pustec | Interview | yes | Village | no |
178 | Balkan Slavic | Prespa | Tuminec | Interview | yes | Village | no |
179 | Balkan Slavic | Prespa | Tuminec | Interview | yes | Village | no |
180 | Balkan Slavic | Prespa | Tuminec | Interview | yes | Village | no |
181 | Balkan Slavic | Prespa | Tuminec | Interview | yes | Village | no |
182 | Balkan Slavic | Prespa | Tuminec | Interview | yes | Village | no |
183 | Balkan Slavic | Prespa | Tuminec | Interview | yes | Village | no |
184 | Balkan Slavic | Prespa | Korça | Family | no | City | no |
186 | Štokavian | Myzeqe | Fier | Interview | yes | City | no |
187 | Štokavian | Myzeqe | Fier | Frog | yes | City | no |
11 (1) | Balkan Slavic | Korça | Boboshtica | Interview | yes | Village | no |
11 (2) | Balkan Slavic | Korça | Boboshtica | Interview | yes | Village | no |
185 (1) | Štokavian | Myzeqe | Fier | Interview | yes | City | no |
185 (2) | Štokavian | Myzeqe | Fier | Interview | yes | City | no |
29 (1) | Balkan Slavic | Korça | Korça | Interview | yes | City | no |
29 (2) | Balkan Slavic | Korça | Korça | Interview | yes | City | no |
29 (3) | Balkan Slavic | Korça | Korça | Interview | yes | City | no |
3 (1) | Štokavian | Shijak | Borake | Interview | yes | Village | no |
3 (2) | Štokavian | Shijak | Borake | Interview | yes | Village | no |
33 (1) | Štokavian | Myzeqe | Rreth Libofsha | Interview | yes | Village | yes |
33 (2) | Štokavian | Myzeqe | Rreth Libofsha | Interview | yes | Village | yes |
33 (3) | Štokavian | Myzeqe | Rreth Libofsha | Interview | yes | Village | yes |
33 (4) | Štokavian | Myzeqe | Rreth Libofsha | Interview | yes | Village | yes |
43 (1) | Balkan Slavic | Golloborda | Elbasan | Interview | yes | City | no |
43 (2) | Balkan Slavic | Golloborda | Elbasan | Interview | yes | City | no |
46 (1) | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
46 (2) | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
5 (1) | Štokavian | Shijak | Borake | Interview | yes | Village | no |
5 (2) | Štokavian | Shijak | Borake | Interview | yes | Village | no |
5 (3) | Štokavian | Shijak | Borake | Interview | yes | Village | no |
5 (4) | Štokavian | Shijak | Borake | Interview | yes | Village | no |
53 (1) | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
53 (2) | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
53 (3) | Balkan Slavic | Golloborda | Trebisht | Interview | yes | Village | no |
85 (1) | Balkan Slavic | Prespa | Dolna Gorica | Interview | yes | Village | no |
85 (2) | Balkan Slavic | Prespa | Dolna Gorica | Interview | yes | Village | no |
87 (1) | Štokavian | Shijak | Sukth | Interview | yes | Village | no |
87 (2) | Štokavian | Shijak | Sukth | Frog | yes | Village | no |
87a | Štokavian | Shijak | Sukth | Frog | yes | Village | no |
88 (1) | Balkan Slavic | Prespa | Pustec | Interview | yes | Village | no |
88 (2) | Balkan Slavic | Prespa | Pustec | Interview | yes | Village | no |
89 (1) | Balkan Slavic | Prespa | Pustec | Interview | yes | Village | no |
89 (2) | Balkan Slavic | Prespa | Pustec | Interview | yes | Village | no |
92 (1) | Balkan Slavic | Prespa | Pustec | Interview | yes | Village | no |
92 (2) | Balkan Slavic | Prespa | Pustec | Interview | yes | Village | no |
93 (1) | Balkan Slavic | Prespa | Dolna Gorica | Interview | yes | Village | no |
93 (2) | Balkan Slavic | Prespa | Dolna Gorica | Interview | yes | Village | no |
95 (1) | Štokavian | Shijak | Borake | Interview | yes | Village | no |
95 (2) | Štokavian | Shijak | Borake | Interview | yes | Village | no |
95 (3) | Štokavian | Shijak | Borake | Interview | yes | Village | no |
188 | Štokavian | Shijak | Borake | Family | no | Village | no |
— What is the Corpus of Slavic dialects in Albania?
This is a language corpus or collection of non-adapted transcripts of interviews done in Slavic dialects that are spoken in Albania. Each word form in these dialects included in the corpus are enriched with additional linguistic information or annotations. We also have a user-friendly interface that allows for writing search queries.
— Who needs corpora?
Corpora are used by linguists. The search engines and annotations of corpora are designed to allow for easily making linguistic queries such as “find all pronouns in the accusative case” or “find all forms of the word mačka followed by a verb” or “find all instances of a noun followed by an adjective” so that you can retrieve relevant information from the provided linguistic varieties in seconds. Further analyses of this type of data allow linguists to determine how linguistic varieties have changed, how Albanian has influenced these varieties, what the limits of variation are, or whether there are any new and interesting linguistic phenomena that are not found in Macedonian and Štokavian dialects that have had no contact with Albanian.
Aside from linguists, corpora can be useful tools for language teachers, language learners, and even native speakers.
A corpus documents linguistic varieties in a given period. For example, one of the varieties included (Korça Macedonian) appeared to have gone extinct during our project (or its last speakers became unavailable to the researchers due to their old age). To the best of our knowledge, our corpus includes the last speech examples of this dialect available. It preserves this dialect and other included varieties for future generations and can be used by language activists for language revitalization.
— Can I use the corpus for other things beyond purely linguistic research?
The Corpus of Slavic dialects in Albania makes full transcripts of our recordings available. Aside from merely linguistic interests, the content of the transcripts can also be analyzed since the transcripts are so diverse and include many narratives containing oral history, identity, and anonymized personal biographies. Our transcripts also include much ethnographic and ethnolinguistic information on the traditional culture of the communities, which can be relevant for ethnolinguists, ethnographers, and members of the communities. There are also many examples of oral folk traditions (songs, tales, proverbs, etc.) available for researchers and the general public.
— Can I use the corpus as a dictionary?
You might not be able to use this corpus like you would a traditional dictionary because it does not provide translations or explanations of the included words. You may, however, discover in which context the word is used, which you can then use to clarify the word’s meaning.
— What is morphological annotation, and how is it obtained?
Our corpus was lemmatized and morphologically annotated. Lemmatization means that each word in the texts was annotated with its lemma, i.e., its dictionary or citation form. Morphological annotation means that the grammatical features of each word were annotated, including its part of speech, number, case, tense, etc. Since the corpus was too large for manual annotations, it was annotated automatically with programs called morphological analyzers.
We used analyzers compiled for standard Macedonian since it is closest to Golloborda, Korça, and Prespa Macedonian structurally, as well as analyzers compiled for Štokavian-based standard languages for Myzeqe and Shijak Štokavian (mostly the Croatian analyzer since it could account for dialectal variations in phonology and morphology within the Štokavian dialects). The language denominators for the respective analyzers do not make any claims about the identities of our speakers and were only used as references in external sources.
The results of the automatic annotation were partially proofread and edited manually. Our corpus still has homonymy, i.e., when one word form may have several possible morphological analyses. For example, ja in Macedonian dialects can mean ‘I’ (first-person singular personal pronoun in the nominative case), ‘her’ (third-person singular feminine personal pronoun in the accusative case), ‘here’ (a deictic particle), etc. Hence, when looking for anything within the corpus, you will receive false positive results. Manually checking the data you find in the corpus is thus strongly recommended.
This corpus is the main research instrument developed for the project “Contact-induced language change in situations of non-stable bilingualism—Its limits and modelling: Slavic (social) dialects in Albania,” funded by the DFG (German Research Foundation), project number 8750/1-1 (October 16th, 2019–April 30th, 2024). The principal investigator was Dr. Maxim Makartsev. See List of project-related publications.
The concept, development, and realization of this project would not have been possible without the constant support of Prof. Dr. Gerd Hentschel, my deepest gratitude to whom words cannot express. I am deeply indebted to Prof. Dr. Jan Patrick Zeller and my other colleagues from the Institute for Slavistics (Carl von Ossietzky Universität Oldenburg) for their support and thorough feedback on my project during its various stages.
Berman, Ruth A., Dan I. Slobin, Sven Stromqvist, and Ludo T. Verhoeven. 1994–2004. Relating Events in Narrative. Hillsdale, N.J. L. Erlbaum Associates.
Bojović, Jovan R., ed. 1991. Stanovništvo slovenskog porijekla u Albaniji : zbornik radova sa međunarodnog naučnog skupa održanog u Cetinju 21, 22. i 23. juna 1990. Titograd: Stručna knjiga.
Čëxa, Oksana V. 2009. “Novogrečeskaja leksika narodnoj astronomii v sopostavlenii s balkanoslavjanskoj: Luna i lunnoe vremja (ėtnolingvističeskij aspekt).” Ph.D., Institute of Slavic Studies, Russian academy of sciences. https://inslav.ru/event/chyoha-oksana-vladimirovna-novogrecheskaya-leksika-narodnoy-astronomii-v-sopostavlenii-s.
Cvetanovski, Goce. 2010. Govorot na makedoncite vo Mala Prespa: zapadnoprespanski govor. Skopje: Institut za makedonski jazik “Krste Misirkov”.
Hentschel, Gerd, and Jan P. Zeller. 2013. “Gemischte Rede, gemischter Diskurs, Sprechertypen: Weißrussisch, Russisch und gemischte Rede in der Kommunikation weißrussischer Familien.” In Wiener Slawistischer Almanach, edited by Aage A. Hansen-Löve and Tilmann Reuther, 127–55 70. München, Berlin, Wien: Peter Lang.
Makartsev, Maxim. 2017. “Ėtjudy k balkanskomu bestiariju: Kukuška.” Živaja starina 95 (3): 46.
———. 2023. “Razvoj balkanoslavenskoga tipa futura u štokavskim iseljeničkim dijalektima u Albaniji i jezički kontakti.” Književni jezik (34): 41–69.
Makartsev, Maxim, and Natalia Kikilo. 2022. “Some Tendencies in the Morphosyntax of the Migrational Shtokavian Dialects in Albania (Shijak and Myzeqe) And Slavic-Albanian Language Contact.” Slavic World in the Third Millennium 17 (1-2): 120–41. doi:10.31168/2412-6446.2022.17.1-2.07.
Mayer, Mercer. 1969. Frog, Where Are You? Sequel to a Boy, a Dog and a Frog. New York: Dial Books for Young Readers (a division of Penguin Putnam Inc.).
Mazon, André. 1936. Documents, contes et chansons slaves de l’Albanie du Sud. Bibliothèque d’études balkaniques 5. Paris: Librarie Droz.
Mazon, André, and Maria Filipova-Bajrova. 1965. Documents slaves de l’Albanie du Sud: II. Pièces complémentaires. Bibliothèque d’études balkaniques 8. Paris: Institut d’études slaves.
Plotnikova, Anna A. 2009. Materialy dlja ėtnolingvističeskogo izučenija balkanoslavjanskogo areala. 2, revised. Moskva: Institut slavjanovedenija RAN.
Seliščev, Afanasij M. 1931. Slavjanskoe naselenie v Albanii (s illjstracijami v tekste i s kartoju Albanii). Sofia.
Sobolev, Andrey N., and Aleksandr Novik. 2013. Golo Bordo (Gollobordë), Albanija: Iz materialov balkanskoj ėkspedicij RAN i SPbGU 2008-2010 gg. Materialien zum Südosteuropasprachatlas Bd. 6. Sankt-Peterburg: Nauka.
———. 2017. Gollobordë (Golo Bordo), Shqipëri: Nga materialet e ekspeditës ballkanike të AShR-së dhe UShSt-P-së në vitet 2008-2010. Translated by Ligor Cullufe. Tiranë: Botimet Toena.
———. 2018. Golo Brdo: Od materijalite na balkanskata ekspedicija na RAN i SPbDU vo 2008-2010 godina. Materialien zum Südosteuropasprachatlas Band 6. Skopje, Sankt Peterburg: Univerzitet "Sv. Kiril i Metodij"; Institut za makedonski jazik "Krste Misirkov"; "Nauka".
Steinke, Klaus, and Xhelal Ylli. 2007. Die slavischen Minderheiten in Albanien (SMA): 1. Teil. Prespa-Vërnik-Boboshtica. Slavistische Beiträge 458. München: Otto Sagner.
———. 2008. Die slavischen Minderheiten in Albanien (SMA): 2. Teil. Golloborda-Herbel-Kërçishti i Epërm. Slavistische Beiträge 462. München: Otto Sagner.
———. 2010. Die slavischen Minderheiten in Albanien (SMA): 3. Teil. Gora. Slavistische Beiträge 474. München: Otto Sagner.
———. 2013. Die slavischen Minderheiten in Albanien (SMA): 4. Teil. Vraka-Borakaj. Slavistische Beiträge 491. München, Berlin: Sagner.
Tončeva, Veselka. 2014. Našencite v Albanija: Istorija, ezik, tradicii. Sofia: Ongăl.
Vidoeski, Božidar. 1998. Dijalektite na makedonskiot jazik. Dijalektite na makedonskiot jazik 1. Skopje: Makedonska Akademija na naukite i umetnostite.
Ylli, Xhelal. 1997. Das slavische Lehngut im Albanischen. Teil 1 : Lehnwörter. Slavistische Beiträge. Digitale Ausgabe 350. München: Verlag Otto Sagner.
———. 2000. Das slavische Lehngut im Albanischen. Teil 2 : Ortsnamen. Slavistische Beiträge. Digitale Ausgabe 395. München: Verlag Otto Sagner.
If you have questions, would like to propose collaboration, or noticed an error in the corpus, please contact Dr. Maxim Makartsev.