• The Zotero library underlying the CEToM bibliography is now public and can be viewed here.
  • We would like to thank Prof. Dr. Thomas Oberlies and Pratik Rumde from the Seminar für Indologie und Tibetologie of the Georg-August-Universität Göttingen for providing our project with scans of the nachlass of Wilhelm Siegling. The nachlass includes letters to and from Siegling throughout his career that are of great importance to the history of the field of Tocharian studies. This material will be published on CEToM, accompanied by transcriptions of the letters, in the course of 2024.



Philological method

We distinguish between transliteration and transcription. All forms in the word and grammar database are based on the transcription.

Manuscript pressmarks

Many fragments are known by more than one pressmark. In general, we refer to fragments using the most recent, unified systems. The most important are:

  • IOL and Or for the London collection
  • PK for the Paris Pelliot collection
  • SI for the St. Petersburg collection
  • THT for the Berlin Turfan collection


In general, the transliteration is meant to reflect the original manuscript as closely as possible.

Therefore, possible restorations are not proposed here but in the transcription only. An exception is a missing virāma, which is added in the transliteration but not in the transcription, because in the transcription virāma is never marked. Furthermore, to keep the transliteration readable, and in accordance with the tradition of Tocharian studies, word division is already applied here, sandhi is indicated with the equal sign "=", and compounds and clitic pronouns are marked with a hyphen.

Another exception concerns forms that have been cleary read by editors in past editions but which are lost now on the original manuscript; these forms are marked by "(( ))" in the transliteration but are left unmarked in the transcription; see, e.g., THT 338.

Corrections made in the original manuscript (either with or without correction cross) are marked by "« »". The corrected spelling is given only in the philological commentary, where it is explained from which (incorrect) form it has been corrected.

In a similar way, deleted akṣaras in the original manuscript are marked by "«† »".

We distinguish between:

  • virāma stroke "\" and virāma stroke with dot "\".
  • unreadable and lost akṣaras. Merely unreadable ones will be rendered by "–" (one single akṣara) or "·" (part of an akṣara), while "(–)" and "(·)" indicate completely lost text.


The rectangular brackets "[ ]" that mark damaged akṣaras in the transliteration are no longer indicated in the transcription. In contrast, the regular brackets "( )" are used to indicate restorations of missing akṣaras and akṣara parts.

Fremdzeichen and virāma

The spelling of the Fremdzeichen, special Tocharian akṣaras with an inherent vowel ä instead of a, is simplified. The consonantal element of the Fremdzeichen akṣara is rendered with its regular counterpart and the inherent vowel is given as ä. The virāma stroke and dot, including any silent ä, are no longer indicated. For instance:

  • ñem\ (transliteration) = ñem (transcription)
  • mant\ (transliteration) = mänt (transcription)
  • lāñcä\ (transliteration) = lāñc (transcription)


Misspelled forms are kept as such in the transliteration but corrected in the transcription, marked with "{ }", e.g. {p}rocer ‘brother’ in the transcription for ṣrocer in the transliteration and in the manuscript itself.

Our corrections of misspellings must not be confused with the original corrections made in the manuscript, which are marked in the transliteration but not in the transcription. Similarly, akṣaras deleted in the original manuscript are omitted in the transcription.

Restoration across line ends

If restoration of text preceding the beginning of the first line of a fragment and following the last line is possible, and the margin of the fragment is preserved, we introduce a new line. If the text is actually attested on another leaf, this is indicated; cf. fragment PK AS 7D a (which is immediately preceded by PK AS 7C b):

PK AS 7C b 6 (mā no yamaṣäṃ mā tu)
a1 yāmtsi pyutka(ṣṣäṃ)

Metrical analysis

Unfortunately, no comprehensive study of the Tocharian metre is available. However, thanks to observations of, in particular, Sieg and Siegling, the following basic principles have been established:

  • A poem consists of stanzas. The minimum number is 1, but in longer verse works stanza numbers well over 50 are attested.
  • A metrical stanza usually consists either of four or (rarely) five lines, called pādas.
  • Within pādas, the number of syllables is essential; syllable length plays no role. Pādas may be of equal length, with each pāda having for instance 15 syllables, or they may be of unequal length, for instance 20||22||10||15||. With pādas of unequal length, the pattern of the number of syllables is identical for each stanza.
  • Pādas are further subdivided into cola. A pāda of 15 syllables may for instance consist of two cola of 8 and 7 or of 7 and 8 syllables: 8|7 or 7|8. The first colon is followed by a caesura.
  • In addition, Winter 1959 noted that cola often show further subdivisions into what we call subcola. In contrast to the quite regular position of caesurae dividing cola, there is much more license with respect to the possible word-end positions within a colon.

In metrical passages we use the following symbols to mark pāda and colon ends as far as they can be established:

  • # = pāda end
  • #34a = end of pāda 34a
  • ; = colon end
  • We do not mark subcola ends.

In addition to a certain syntactic freedom to match the number of syllables in pādas and cola, metrical passages may also show other linguistic characteristics not or only rarely found in prose passages. For Tocharian B, for instance, the following phenomena are specific to verse texts: the change of word-initial vocalic o into the glide w, type wnolme for onolme ‘being’; the syncope of an underlying word-internal ä, type āstre for astare ‘pure’ (see most recently Pronk 2009); the occurrence of so-called o mobile, type ñemo for ñem ‘name’ (see most recently Malzahn 2012a); or the lengthening of vowels in absolute word-final position (see Kosta 1988). Even fragmentary metrical passages can thus be identified if they contain such linguistic characteristics. However, in these cases the pada/colon division usually remains uncertain.


The symbol "=" denotes sandhi and is placed where a sound has disappeared. For instance, -e a- resulting in -a- is noted as -= a- In those rare cases where the sandhi product differs from both original sounds that coalesced, "=" is written instead of the first sound. For instance, -a a- resulting in -ā- is noted as -= ā-.

Vowel sandhi

  • TB aiskeman= āyor for aiskemane āyor
  • TB tak= ānaiśai for taka anaiśai
  • The development of a glide in sandhi is not marked by "="; e.g., TB kautsy akemane for kautsi akemane
  • Sometimes, vowel sandhi coincides with a colon end, cf. TB aknātsaṃñ= ; emāno for aknātsaṃñe ; amāno.

Consonant sandhi

  • TB os= tärkau for ost tärkau

Script type

In general, we follow the classification of the script types proposed by Malzahn 2007a, where a very archaic, middle archaic, and archaic phase are distinguished beside the classical and late ductus of literary manuscripts. In contrast, non-literary documents often show a cursive ductus. For the development of the Tocharian variant of the Brāhmī script in general, see Sander 1968.

Linguistic stages

In general, we follow the chronological classification of Tocharian B by Peyrot 2008, distinguishing an archaic, a classical, and a late phase. Although these linguistic layers are basically chronological, pure texts for any of the stages are exceedingly rare. Therefore, we have a second classification unit “Additional linguistic characteristics” for texts that show: a) some archaic forms without being completely archaic; b) some late forms while being in general classical (or even archaic); c) both archaic and late forms; and d) hypercorrect forms such as träṅkä in the archaic manuscript B334 a 1 and a 5 for the o-stem träṅko ‘sin’.

Technical notes

  • Programming of the system has not been finished yet. Above all, a useful search engine has still to be programmed.
  • Some computers and/or browsers may encounter difficulties to display special characters correctly. Please compare the display of the following examples of frequently used characters with their description:
    CharacterUnicode code point (and block)Description
    äU+00E4 (Latin-1 Supplement)Latin small letter A with diaeresis
    ñU+00F1 (Latin-1 Supplement)Latin small letter N with tilde
    āU+0101 (Latin Extended-A)Latin small letter A with macron
    śU+015B (Latin Extended-A)Latin small letter S with acute
    U+1E43 (Latin Extended Additional)Latin small letter M with dot below
    ṣ ()U+1E63 (Latin Extended Additional)Latin small letter S with dot below (and underlined)
    U+0072 (Basic Latin)
    + U+0325 (Combining Diacritical Marks)
    Latin small letter R
    + Combining ring below
    U+0076 (Basic Latin)
    + U+032F (Combining Diacritical Marks)
    Latin small letter U
    + Combining inverted breve below
  • Since some browsers display the underlined ‹ṣ› (i.e. ‹›) and the underlined ‹s› (i.e. ‹s›) identically, we print out the first one as ‹› to make it distinguishable.
  • See also: abbreviations and symbols.

Quoting our website

You are welcome to quote from the text edition and word database provided on our website in the following way:

A Comprehensive Edition of Tocharian Manuscripts, URL: (retrieved: Dec. 03, 2023)

As far as manuscripts are concerned, please quote the respective editor(s) of a certain text as listed there under the heading “Editor”. For instance A 32:

Gerd Carling (in collaboration with Melanie Malzahn, Michaël Peyrot, and Georges-Jean Pinault), “A 32”, in: A Comprehensive Edition of Tocharian Manuscripts, URL: 32 (retrieved: Dec. 03, 2023)



Kosta 1988

Kosta, Peter. 1988. “Zur Bedeutung der unterschiedlichen Schreibungen der Vokale a/ā, u/ū, i/ī im Auslaut der toch. B-Wörter (an Hand der MQ- und MQR-Texte).” In Studia Indogermanica et Slavica. Festgabe für Werner Thomas zum 65. Geburtstag, edited by Peter Kosta, Gabriele Lerch, and Peter Olivier, 153–73. München: Sagner.

Malzahn 2007a

Malzahn, Melanie. 2007a. “A preliminary survey of the Tocharian glosses in the Berlin Turfan Collection.” In Instrumenta Tocharica, edited by Melanie Malzahn, 301–19. Heidelberg: Winter.

Malzahn 2012a

Malzahn, Melanie. 2012a. “Now you see it, now you don’t — Bewegliches –o in Tocharisch B.” In Linguistic developments along the Silk Road: Archaism and Innovation in Tocharian, edited by Olav Hackstein and Ronald I. Kim, 834:33–82. Sitzungsberichte der Österreichischen Akademie der Wissenschaften, Philosophisch-historische Klasse. Wien: Verlag der ÖAW.

Pronk 2009

Pronk, Tijmen. 2009. “Reflexes of the deletion and insertion of Proto-Tocharian *ä in Tocharian B.” Tocharian and Indo-European Studies 11: 73–123.

Sander 1968

Sander, Lore. 1968. Paläographisches zu den Sanskrithandschriften der Berliner Turfansammlung. Wiesbaden: Franz Steiner.

Winter 1959

Winter, Werner. 1959. “Zur „tocharischen“ Metrik.” In Akten des XXIV. Internationalen Orientalistenkongresses München 1957, edited by Herbert Franke, 520–21. Wiesbaden: Steiner.