Hi Tomasz,
The change to lowercase is also done in the dictionary code.  
Unless you want to make a database for the previous dictionary lookup module 
(it looks like you don't), you shouldn't bother with the old dictionarytool.jar
Use the newer dictionary-gui in sandbox instead.
The class there is org.apache.ctakes.dictionary.creator.util.TextTokenizer
In the getTokenizedText(..) method, line 177, just remove the .toLowerCase()

In the ctakes -fast module code you will need to replace the 
...dictionary.lookup2.util.FastLookuptoken and remove the .toLowerCase() from 
the constructor method, line 45.  You cannot extend that class as it is 
immutable.

Sean

-----Original Message-----
From: Tomasz Oliwa [mailto:ol...@uchicago.edu] 
Sent: Wednesday, June 01, 2016 3:20 PM
To: dev@ctakes.apache.org
Subject: RE: cTAKES false positives, case-insensitivity

Another idea would be to create the dictionary without lowercasing the concept 
text and rare word in CUI_TERMS, but keep them as they are from the UMLS.

Do you happen to know which class / line is responsible for the lowercasing in 
the dictionarytool.jar ? I could like to try this.

Regards,
Tomasz

________________________________________
From: Tomasz Oliwa [ol...@uchicago.edu]
Sent: Wednesday, June 01, 2016 11:07 AM
To: dev@ctakes.apache.org
Subject: RE: cTAKES false positives, case-insensitivity

Thank you all for the suggestions.

Sean, by "make the AE case-sensitive" do you mean writing an annotator that 
simply removes an annotation based on some criteria like case and semantic 
type? Or does cTAKES have such a switch already available?

________________________________________
From: Finan, Sean [sean.fi...@childrens.harvard.edu]
Sent: Wednesday, June 01, 2016 10:56 AM
To: dev@ctakes.apache.org
Subject: RE: cTAKES false positives, case-insensitivity

Oh - I should mention:
Increasing the minimum required span cause have unwanted false negatives.  A 
minimum of 5 will get rid of things like "arm" and "foot".  You could make your 
own AE that changes this by getting rid of only disease/disorder with character 
count < 5 .  That would probably be better.  Also maybe meds with count < 5.  
You can even make the AE case-sensitive in case that helps.

Sean

-----Original Message-----
From: Tomasz Oliwa [mailto:ol...@uchicago.edu]
Sent: Wednesday, June 01, 2016 11:28 AM
To: dev@ctakes.apache.org
Subject: cTAKES false positives, case-insensitivity

Hi,

I have encountered false positives annotated with cTAKES that seem to come from 
case-insensitivity of the annotation lookup, such as:

Pt uses hearing aids. -> "aids" is found as DiseaseDisorderMention 
cui=C0001175, Acquired Immunodeficiency Syndrome

Pt values are all stable. -> "all" is found as DiseaseDisorderMention 
cui=C1961102, Precursor Cell Lymphoblastic Leukemia Lymphoma"

Are there ways in cTAKES to approach or to resolve such issues?

How do you deal with such false positives, so that they are not matched?

Regards,
Tomasz

Reply via email to