This is a type of word sense disambiguation; there is a lot of literature on this subject. Co-occurence is one way of doing it, not necessarily the best; you need a ton of annotated data for it to work well.
On Thu, Aug 21, 2014 at 9:08 PM, John Green <[email protected]> wrote: > Are there any acronym annotators and disambiguators? What are people doing > in production elsewhere? Im learning the heart of cTakes and UIMA by the > numbers right now and I think writing an annotator of my own will be the > best way to solidify the information. If no one has it done already, I > thought Id write a simple acronym annotator and disambiguator. The > disambiguation would just be a co-occurance over a lookup window across a > private corpus I have access to, e.g., word1 word 2 word3 acronym1 word4 > word5 word6. I would provide specificity by excluding words that tend to > occur frequently across instances of the acronyms with the same > abbreviation. > > But, if someone has already done it and is planning on releasing it, I hate > to reproduce wheels... > > JG >
