Hi Gandhi, Abilash Mathew, This is a common problem stemming from the nature of the umls and automated dictionary creation. I am still (ever so slowly) improving the dictionary creator code. If anybody can devote some time to help that would be great.
Anyway, if you are using ctakes trunk there is a new capability of the -fast dictionary lookup. Basically, you can "blacklist" texts that you do not want. 1. Create a bar-separated value (bsv) file containing the ctakes numeric code for a semantic group and the text that you don't want. 2. Set the parameter "Blacklist" to point to the file. The ctakes numeric semantic group codes can be found in the CONST class in ctakes-type-system. The pertinent codes: 1 = medication 2 = disease / disorder 3 = sign / symptom 5 = procedure 6 = anatomical site 9 = lab 0 = unknown Example: // My Blacklist File. # double-slash and hash indicate comment lines. 3|Finding 5|test 5|Procedure 1|Page 5|treatment 1|medicine 1|Drug // Not sure what "finicky thing" belongs to, so I'll just add a bunch: 1|finicky thing 2|finicky thing 3|finicky thing 5|finicky thing 6|finicky thing The semantic group codes are there so that you can (for instance) make ctakes ignore "1|Drug" as a generic indication some medication but keep "drug" as a procedure. For instance "aspirin is a drug" versus "the patient was drugged before proceeding". Once you have the file, set "Blacklist" to point to the file as you would other ctakes pipeline parameters. Use of the texts in the blacklist is case insensitive. There is no difference between adding "1|DRUG" and "1|drug". If you do want case sensitivity ... you can use a blacklist file pointed to by the parameter "CsBlacklist". I think that is about it. Sean -----Original Message----- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Wednesday, October 25, 2017 9:14 AM To: dev@ctakes.apache.org Subject: RE: false positive [EXTERNAL] Hi Abilash, I'm not sure how much it will make sense. But in our custom annotator we wrote on top of cTAKES, we resolved this false positives to an extent by using commonly used English words metadata available from OpenNLP. Regards, Gandhi -----Original Message----- From: abilash.mat...@cognizant.com [mailto:abilash.mat...@cognizant.com] Sent: Wednesday, October 25, 2017 3:57 PM To: dev@ctakes.apache.org Subject: false positive Hi all, We are seeing some false positives identified by CTAKES after we tested couple of medical records samples. Can anyone help us on how to ignore these words from tagging incorrectly? Word Finding test Procedure Page Procedure treatment Procedure medicine Drug medication Drug attachments Procedure RELEASE Procedure reconstruction Procedure DOB Drug Procedure Procedure Division Procedure Thanks, Abilash Mathew This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender or system manager by email immediately if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited and against the law.