
Word and other token tagging/"Tokenization"


All original (Greek and Latin) characters in the edition should be tagged with one of the following elements (modern punctuation may be left untagged; ancient punctuation should use <g>:



In addition, any reference to a person (which may be made up of names and/or words/placenames, etc.) should be tagged as persName. Each persName must take one of the following types:




<persName type="attested">
     <name type="praenomen"><expan>M<ex>arcus</ex></expan></name>
     <name type="gentilicium">Iulius</name>
     <name type="cognomen">Aurelianus</name>