Hi again,

The blacklists should now accept comma-separated semantic group codes.

Thanks,
Sean

-----Original Message-----
From: Kean Kaufmann [mailto:k...@recordsone.com.INVALID] 
Sent: Wednesday, October 25, 2017 10:38 AM
To: dev@ctakes.apache.org
Subject: Re: false positive [EXTERNAL]

Sean, thanks!  Blacklisting is essential, and making it category-specific is a 
really nice touch.

Dispatch from the trenches, FWIW:

a) The blacklist can get quite big, e.g. when mining common wordlists.  To 
reduce bloat, might you allow comma-separated lists of semantic groups in the 
first field? e.g.

1,2,3,4,5,6|finicky thing

b) Have you found the "disease/disorder" vs. "sign/symptom" distinction useful? 
 For CAC purposes, we've introduced a superset ProblemMention into the type 
system so we don't have to bother with it.  Maybe an extra semantic group 
("23"?) would come in similarly handy.

Appreciatively,
Kean




On Wed, Oct 25, 2017 at 9:46 AM, Finan, Sean < 
sean.fi...@childrens.harvard.edu> wrote:

> Hi Gandhi, Abilash Mathew,
>
> This is a common problem stemming from the nature of the umls and 
> automated dictionary creation.  I am still (ever so slowly) improving 
> the dictionary creator code.  If anybody can devote some time to help 
> that would be great.
>
> Anyway, if you are using ctakes trunk there is a new capability of the 
> -fast dictionary lookup.  Basically, you can "blacklist" texts that 
> you do not want.
>
> 1. Create a bar-separated value (bsv) file containing the ctakes 
> numeric code for a semantic group and the text that you don't want.
> 2. Set the parameter "Blacklist" to point to the file.
>
> The ctakes numeric semantic group codes can be found in the CONST 
> class in ctakes-type-system.  The pertinent codes:
> 1 = medication
> 2 = disease / disorder
> 3 = sign / symptom
> 5 = procedure
> 6 = anatomical site
> 9 = lab
> 0 = unknown
>
> Example:
>
> // My Blacklist File.
> # double-slash and hash indicate comment lines.
> 3|Finding
> 5|test
> 5|Procedure
> 1|Page
> 5|treatment
> 1|medicine
> 1|Drug
> // Not sure what "finicky thing" belongs to, so I'll just add a bunch:
> 1|finicky thing
> 2|finicky thing
> 3|finicky thing
> 5|finicky thing
> 6|finicky thing
>
> The semantic group codes are there so that you can (for instance) make 
> ctakes ignore "1|Drug" as a generic indication some medication but 
> keep "drug" as a procedure.  For instance "aspirin is a drug" versus 
> "the patient was drugged before proceeding".
>
> Once you have the file, set "Blacklist" to point to the file as you 
> would other ctakes pipeline parameters.
>
> Use of the texts in the blacklist is case insensitive.  There is no 
> difference between adding "1|DRUG" and "1|drug".
> If you do want case sensitivity ...  you can use a blacklist file 
> pointed to by the parameter "CsBlacklist".
>
> I think that is about it.
>
> Sean
>
> -----Original Message-----
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Wednesday, October 25, 2017 9:14 AM
> To: dev@ctakes.apache.org
> Subject: RE: false positive [EXTERNAL]
>
> Hi Abilash,
>
> I'm not sure how much it will make sense. But in our custom annotator 
> we wrote on top of cTAKES, we resolved this false positives to an 
> extent by using commonly used English words metadata available from OpenNLP.
>
> Regards,
> Gandhi
>
> -----Original Message-----
> From: abilash.mat...@cognizant.com 
> [mailto:abilash.mat...@cognizant.com]
> Sent: Wednesday, October 25, 2017 3:57 PM
> To: dev@ctakes.apache.org
> Subject: false positive
>
> Hi all,
>
> We are seeing some false positives identified by CTAKES after we 
> tested couple of medical records samples. Can anyone help us on how to 
> ignore these words from tagging incorrectly?
>
> Word
>
> Finding
>
> test
>
> Procedure
>
> Page
>
> Procedure
>
> treatment
>
> Procedure
>
> medicine
>
> Drug
>
> medication
>
> Drug
>
> attachments
>
> Procedure
>
> RELEASE
>
> Procedure
>
> reconstruction
>
> Procedure
>
> DOB
>
> Drug
>
> Procedure
>
> Procedure
>
> Division
>
> Procedure
>
>
> Thanks,
> Abilash Mathew
> This e-mail and any files transmitted with it are for the sole use of 
> the intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to 
> the sender and destroy all copies of the original message. Any 
> unauthorized review, use, disclosure, dissemination, forwarding, 
> printing or copying of this email, and/or any action taken in reliance 
> on the contents of this e-mail is strictly prohibited and may be 
> unlawful. Where permitted by applicable law, this e-mail and other 
> e-mail communications sent to and from Cognizant e-mail addresses may be 
> monitored.
> This email and any files transmitted with it are confidential and 
> intended solely for the use of the individual or entity to whom they are 
> addressed.
> If you are not the named addressee you should not disseminate, 
> distribute or copy this e-mail. Please notify the sender or system 
> manager by email immediately if you have received this e-mail by 
> mistake and delete this e-mail from your system. If you are not the 
> intended recipient you are notified that disclosing, copying, 
> distributing or taking any action in reliance on the contents of this 
> information is strictly prohibited and against the law.
>

Reply via email to