RE: cTAKES false positives, case-insensitivity

Finan, Sean Wed, 01 Jun 2016 08:50:02 -0700

Hi Tomasz,

Ctakes lookup (both original and fast-) is case insensitive by design.  There 
have been brief discussions on changing this behavior, but things like 
capitalized form entries, list headings, and plain old first word 
capitalization have prevented it from being implemented.


One big interest in the community is word sense disambiguation, which would 
allow the culling of terms based upon the likelihood that they do not properly 
fit in context.

Culling could also be done based upon normal frequency of the term appearing in 
text.  Or you could create an annotation engine that culls based upon some 
other requirement, such as semantic type.

For your two specific examples you can prevent a lot of false positive acronyms 
and abbreviations by increasing the required character count cutoff for terms.  
This can be done by setting the uima parameter "minimumSpan" to 5 (getting rid 
of "AIDS" but keeping "APSGN").  You can do this using the old xml style or 
uimafit, something like 

AnalysisEngineFactory.createEngineDescription( DefaultJCasTermAnnotator.class, 
JCasTermAnnotator.PARAM_MIN_SPAN_KEY, 3 )

Sean


-----Original Message-----
From: Tomasz Oliwa [mailto:ol...@uchicago.edu] 
Sent: Wednesday, June 01, 2016 11:28 AM
To: dev@ctakes.apache.org
Subject: cTAKES false positives, case-insensitivity

Hi,

I have encountered false positives annotated with cTAKES that seem to come from 
case-insensitivity of the annotation lookup, such as:

Pt uses hearing aids. -> "aids" is found as DiseaseDisorderMention 
cui=C0001175, Acquired Immunodeficiency Syndrome

Pt values are all stable. -> "all" is found as DiseaseDisorderMention 
cui=C1961102, Precursor Cell Lymphoblastic Leukemia Lymphoma"

Are there ways in cTAKES to approach or to resolve such issues?

How do you deal with such false positives, so that they are not matched?

Regards,
Tomasz

RE: cTAKES false positives, case-insensitivity

Reply via email to