| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

View
 

WordTagging

Page history last edited by Gabriel Bodard 2 years ago

Word and other token tagging/"Tokenization"

 

All original (Greek and Latin) characters in the edition should be tagged with one of the following elements (modern punctuation may be left untagged; ancient punctuation should use <g>:

 

  • w - a lexical word, not known to be a proper name etc.
    • if an incomplete word, use the attribute @part
      • part="I" - the initial part of a word (i.e. the end is missing or unresolvable)
      • part="M" - the middle part of a word (i.e. the beginning and end are both missing or unresolvable)
      • part="F" - the final part of a word (i.e. the beginning is missing or unresolvable) 
      • (rarely) part="Y" -  an obviously incomplete word, but not sure whether it is initial/final etc.
  • name - a personal name (including cognomina; but not "Imperator").
    • - an imperial cognomen such as "Sarmaticus" should be tagged as a name.
    • name cannot take @part, so in the case of an incomplete name, a seg element needs to appear inside the name, with @part (as for words, above)
  • placeName - a name of a place
    • if a proper adjective - type="ethnic"
    • for a colony name, e.g. colonia Septimia Lepcis Magna - tag "colonia" as a word, "Septimia" as a name and "Lepcis Magna" as a placeName:
      i.e. <placeName ref="mentionedplace.xml#p123"><w>colonia</w> <name>Septimia</name> <placeName>Lepcis Magna</placeName></placeName>
  • num - a numeral
  • g - a non-alphabetic symbol such as "denarius," "leaf" or "year" (either for which no Unicode code-point exists, or which is not easy to type, or is not traditionally printed as a character in Leiden)
  • abbr - an abbreviation for which we do not know the expansion; e.g. "υ(...)"
  • orig - none of the above, text that we can not resolve in any way (only if the editor has put this word in uppercase)

 

In addition, any reference to a person (which may be made up of names and/or words/placenames, etc.) should be tagged as persName. Each persName must take one of the following types:

 

  • attested - any person attested other than emperors, consuls, gods etc.
  • ruler - a member of the imperial or ruling families (in former projects "emperor")
  • divine - a god, hero, angel, personification or other divine entity
  • other - mostly historical or literary figures (rarely used)
  • consular - only if a consul/archon/priest cited for dating (even more rarely used)

 

example:

<persName type="attested">
     <name type="praenomen"><expan>M<ex>arcus</ex></expan></name>
     <name type="gentilicium">Iulius</name>
     <name type="cognomen">Aurelianus</name>
<persName>

 

Comments (0)

You don't have permission to comment on this page.