Hi Gandhi, Abilash Mathew,

This is a common problem stemming from the nature of the umls and automated 
dictionary creation.  I am still (ever so slowly) improving the dictionary 
creator code.  If anybody can devote some time to help that would be great.

Anyway, if you are using ctakes trunk there is a new capability of the -fast 
dictionary lookup.  Basically, you can "blacklist" texts that you do not want.

1. Create a bar-separated value (bsv) file containing the ctakes numeric code 
for a semantic group and the text that you don't want.
2. Set the parameter "Blacklist" to point to the file.

The ctakes numeric semantic group codes can be found in the CONST class in 
ctakes-type-system.  The pertinent codes:
1 = medication
2 = disease / disorder
3 = sign / symptom
5 = procedure
6 = anatomical site
9 = lab
0 = unknown

Example:

// My Blacklist File.
# double-slash and hash indicate comment lines.
3|Finding
5|test
5|Procedure
1|Page
5|treatment
1|medicine
1|Drug
// Not sure what "finicky thing" belongs to, so I'll just add a bunch:
1|finicky thing
2|finicky thing
3|finicky thing
5|finicky thing
6|finicky thing

The semantic group codes are there so that you can (for instance) make ctakes 
ignore "1|Drug" as a generic indication some medication but keep "drug" as a 
procedure.  For instance "aspirin is a drug" versus "the patient was drugged 
before proceeding".

Once you have the file, set "Blacklist" to point to the file as you would other 
ctakes pipeline parameters.  

Use of the texts in the blacklist is case insensitive.  There is no difference 
between adding "1|DRUG" and "1|drug".
If you do want case sensitivity ...  you can use a blacklist file pointed to by 
the parameter "CsBlacklist".

I think that is about it.

Sean

-----Original Message-----
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Wednesday, October 25, 2017 9:14 AM
To: dev@ctakes.apache.org
Subject: RE: false positive [EXTERNAL]

Hi Abilash,

I'm not sure how much it will make sense. But in our custom annotator we wrote 
on top of cTAKES, we resolved this false positives to an extent by using 
commonly used English words metadata available from OpenNLP.

Regards,
Gandhi

-----Original Message-----
From: abilash.mat...@cognizant.com [mailto:abilash.mat...@cognizant.com]
Sent: Wednesday, October 25, 2017 3:57 PM
To: dev@ctakes.apache.org
Subject: false positive

Hi all,

We are seeing some false positives identified by CTAKES after we tested couple 
of medical records samples. Can anyone help us on how to ignore these words 
from tagging incorrectly?

Word

Finding

test

Procedure

Page

Procedure

treatment

Procedure

medicine

Drug

medication

Drug

attachments

Procedure

RELEASE

Procedure

reconstruction

Procedure

DOB

Drug

Procedure

Procedure

Division

Procedure


Thanks,
Abilash Mathew
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.

Reply via email to