they should be formatted as two files of text lines sharing an index a la: stem_index|stem then word|stem_index|POS_index if the word doesn't have a stem word say, conjunctions and pronouns it should be included as it is. Is there such a thing? or how could you suggest one could build it with available resources?
lbrtchx
_______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
