Tokenization
Tokenization is the first layer of annotation. It identifies the "atoms" which annotation units are attached to. There can be different tokenization schemes, depending on the definition given to "token" (e.g., morpheme or morphosyntactic or prosodic word). The corpus currently contains a level of morphosyntactic tokenization.