Hello all Is there a mechanism, a lookup file, etc which overrides the window size set on the term annotator or the chunker. Changing the window size from the default of 3 to 2 opens the floodgate to false acronym annotations. So my question is whether there's a place where one can register specific two character terms, for example BP or PT which will be found even with a window size set to three.
A similar question about Genes. On adding the HGNC vocabulary I notice that there are many thousands of aliases for genes which overlap other common acronyms and english words such as trip, spring, plan, bed, yes, rip, prn etc. I'm not sure if these aliases are ever used. So I created a sed script with 4000 regex expressions to remove the 2 and 3 letter gene synonyms from a script file. I will only suppress the 4 letter synonyms manually where they cause trouble. But does anyone have a more elegant solution? Peter