Another idea would be to create the dictionary without lowercasing the concept text and rare word in CUI_TERMS, but keep them as they are from the UMLS.
Do you happen to know which class / line is responsible for the lowercasing in the dictionarytool.jar ? I could like to try this. Regards, Tomasz ________________________________________ From: Tomasz Oliwa [ol...@uchicago.edu] Sent: Wednesday, June 01, 2016 11:07 AM To: dev@ctakes.apache.org Subject: RE: cTAKES false positives, case-insensitivity Thank you all for the suggestions. Sean, by "make the AE case-sensitive" do you mean writing an annotator that simply removes an annotation based on some criteria like case and semantic type? Or does cTAKES have such a switch already available? ________________________________________ From: Finan, Sean [sean.fi...@childrens.harvard.edu] Sent: Wednesday, June 01, 2016 10:56 AM To: dev@ctakes.apache.org Subject: RE: cTAKES false positives, case-insensitivity Oh - I should mention: Increasing the minimum required span cause have unwanted false negatives. A minimum of 5 will get rid of things like "arm" and "foot". You could make your own AE that changes this by getting rid of only disease/disorder with character count < 5 . That would probably be better. Also maybe meds with count < 5. You can even make the AE case-sensitive in case that helps. Sean -----Original Message----- From: Tomasz Oliwa [mailto:ol...@uchicago.edu] Sent: Wednesday, June 01, 2016 11:28 AM To: dev@ctakes.apache.org Subject: cTAKES false positives, case-insensitivity Hi, I have encountered false positives annotated with cTAKES that seem to come from case-insensitivity of the annotation lookup, such as: Pt uses hearing aids. -> "aids" is found as DiseaseDisorderMention cui=C0001175, Acquired Immunodeficiency Syndrome Pt values are all stable. -> "all" is found as DiseaseDisorderMention cui=C1961102, Precursor Cell Lymphoblastic Leukemia Lymphoma" Are there ways in cTAKES to approach or to resolve such issues? How do you deal with such false positives, so that they are not matched? Regards, Tomasz