Hello all

Is there a mechanism, a lookup file, etc which overrides the window size
set on the term annotator or the chunker.   Changing the window size from
the default of 3 to 2 opens the floodgate to false acronym annotations.  So
my question is whether there's a place where one can register specific two
character terms, for example BP or PT which will be found even with a
window size set to three.

A similar question about Genes.   On adding the HGNC vocabulary I notice
that there are many thousands of aliases for genes which overlap other
common acronyms and english words such as trip, spring, plan, bed, yes,
rip, prn etc.   I'm not sure if these aliases are ever used.   So I created
a sed script with 4000 regex expressions to remove the 2 and 3 letter gene
synonyms from a script file.  I will only suppress the 4 letter synonyms
manually where they cause trouble.     But does anyone have a  more elegant
solution?

Peter

Reply via email to