The LVG annotator creates an enormous number of "lemmas" for every WordToken in the CAS, and I'm wondering what the original purpose was? I think this is probably a minor bottleneck for speed but mostly a pretty big space hog (at least 50% of the space of xmi files in my tests).
As of right now I'm not sure if any downstream components are using these lemmas, and on a manual inspection the precision seems to be pretty abysmal (meaning most of them are nonsensical as lexical variants), so as I said, just wondering if we can revisit why cTAKES generates so many and whether that component can be optimized. Thanks Tim
