>
> my question is whether there's a place where one can register specific two
> character terms, for example BP or PT which will be found even with a
> window size set to three.


My brute-force approach is pretty brutal: Change the window size to two,
annotate terms, then remove all two-letter annotations except the very few
I'm interested in.

On Mon, Aug 24, 2020 at 9:07 PM Peter Abramowitsch <pabramowit...@gmail.com>
wrote:

> Hello all
>
> Is there a mechanism, a lookup file, etc which overrides the window size
> set on the term annotator or the chunker.   Changing the window size from
> the default of 3 to 2 opens the floodgate to false acronym annotations.  So
> my question is whether there's a place where one can register specific two
> character terms, for example BP or PT which will be found even with a
> window size set to three.
>
> A similar question about Genes.   On adding the HGNC vocabulary I notice
> that there are many thousands of aliases for genes which overlap other
> common acronyms and english words such as trip, spring, plan, bed, yes,
> rip, prn etc.   I'm not sure if these aliases are ever used.   So I created
> a sed script with 4000 regex expressions to remove the 2 and 3 letter gene
> synonyms from a script file.  I will only suppress the 4 letter synonyms
> manually where they cause trouble.     But does anyone have a  more elegant
> solution?
>
> Peter
>

Reply via email to