This is a type of word sense disambiguation; there is a lot of literature
on this subject.  Co-occurence is one way of doing it, not necessarily the
best; you need a ton of annotated data for it to work well.


On Thu, Aug 21, 2014 at 9:08 PM, John Green <[email protected]>
wrote:

> Are there any acronym annotators and disambiguators? What are people doing
> in production elsewhere? Im learning the heart of cTakes and UIMA by the
> numbers right now and I think writing an annotator of my own will be the
> best way to solidify the information. If no one has it done already, I
> thought Id write a simple acronym annotator and disambiguator. The
> disambiguation would just be a co-occurance over a lookup window across a
> private corpus I have access to, e.g., word1 word 2 word3 acronym1 word4
> word5 word6. I would provide specificity by excluding words that tend to
> occur frequently across instances of the acronyms with the same
> abbreviation.
>
> But, if someone has already done it and is planning on releasing it, I hate
> to reproduce wheels...
>
> JG
>

Reply via email to