Re: cTakes Annotation Comparison

Kim Ebert Fri, 19 Dec 2014 10:40:00 -0800

Sean,

I don't think that would be an issue since both the rare word lookup and
the first word lookup are using UMLS 2011AB. Or is the rare word lookup
using a different dictionary?


I would expect roughly similar results between the two when it comes to
differences between UMLS versions.

IMAT Solutions <http://imatsolutions.com>
Kim Ebert
Software Engineer
Office: 801.669.7342
kim.eb...@imatsolutions.com <mailto:greg.hub...@imatsolutions.com>
On 12/19/2014 11:31 AM, Finan, Sean wrote:
> One quick mention:
>
> The cTakes dictionaries are built with UMLS 2011AB.  If the Human annotations 
> were not done using the same UMLS version then there WILL be differences in 
> CUI and Semantic group.  I don't have time to go into it with details, 
> examples, etc. just be aware that every 6 months cuis are added, removed, 
> deprecated, and moved from one TUI to another.
>
> Sean
>
> -----Original Message-----
> From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu] 
> Sent: Friday, December 19, 2014 1:28 PM
> To: dev@ctakes.apache.org
> Subject: RE: cTakes Annotation Comparison
>
> Several thoughts:
> 1. The ShARE corpus annotates only mentions of type Diseases/Disorders and 
> only Anatomical Sites associated with a Disease/Disorder. This is by design. 
> cTAKES annotates all mentions of types Diseases/Disorders, Signs/Symptoms, 
> Procedures, Medications and Anatomical Sites. Therefore you will get MANY 
> more annotations with cTAKES. Eventually the ShARe corpus will be expanded to 
> the other types.
>
> 2. Keeping (1) in mind, you can approximately estimate the 
> precision/recall/f1 of cTAKES on the ShARe corpus if you output only mentions 
> of type Disease/Disorder. 
>
> 3. Could you send us the list of files you use from ShARe to test? We have 
> the corpus and would like to run against as well.
>
> Hope this makes sense...
> --Guergana
>
> -----Original Message-----
> From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] 
> Sent: Friday, December 19, 2014 1:16 PM
> To: dev@ctakes.apache.org
> Subject: Re: cTakes Annotation Comparison
>
> Our analysis against the human adjudicated gold standard from this SHARE 
> corpus is using a simple check to see if the cTakes output included the 
> annotation specified by the gold standard. The initial results I reported 
> were for exact matches of CUI and text span.  Only exact matches were counted.
>
> It looks like if we also count as matches cTakes annotations with a matching 
> CUI and a text span that overlaps the gold standard text span then the 
> matches increase to 224 matching annotations for the FastUMLS pipeline and 
> 2319 for the the old pipeline.
>
> The question was also asked about annotations in the cTakes output that were 
> not in the human adjudicated gold standard. The answer is yes, there were a 
> lot of additional annotations made by cTakes that don't appear to be in the 
> gold standard. We haven't analyzed that yet, but it looks like the gold 
> standard we are using may only have Disease_Disorder annotations.
>
>
>
>  [image: IMAT Solutions] <http://imatsolutions.com>  Bruce Tietjen Senior 
> Software Engineer
> [image: Mobile:] 801.634.1547
> bruce.tiet...@imatsolutions.com
>
> On Fri, Dec 19, 2014 at 9:54 AM, Miller, Timothy < 
> timothy.mil...@childrens.harvard.edu> wrote:
>> Thanks Kim,
>> This sounds interesting though I don't totally understand it. Are you 
>> saying that extraction performance for a given note depends on which 
>> order the note was in the processing queue? If so that's pretty bad! 
>> If you (or anyone else who understands this issue) has a concrete 
>> example I think that might help me understand what the problem is/was.
>>
>> Even though, as Pei mentioned, we are going to try moving the 
>> community to the faster dictionary, I would like to understand better 
>> just to help myself avoid issues of this type going forward (and 
>> verify the new dictionary doesn't use similar logic).
>>
>> Also, when we finish annotating the sample notes, might we use that as 
>> a point of comparison for the two dictionaries? That would get around 
>> the issue that not everyone has access to the datasets we used for 
>> validation and others are likely not able to share theirs either. And 
>> maybe we can replicate the notes if we want to simulate the scenario 
>> Kim is talking about with thousands or more notes.
>>
>> Tim
>>
>>
>> On 12/19/2014 10:24 AM, Kim Ebert wrote:
>> Guergana,
>>
>> I'm curious to the number of records that are in your gold standard 
>> sets, or if your gold standard set was run through a long running cTAKES 
>> process.
>> I know at some point we fixed a bug in the old dictionary lookup that 
>> caused the permutations to become corrupted over time. Typically this 
>> isn't seen in the first few records, but over time as patterns are 
>> used the permutations would become corrupted. This caused documents 
>> that were fed through cTAKES more than once to have less codes 
>> returned than the first time.
>>
>> For example, if a permutation of 4,2,3,1 was found, the permutation 
>> would be corrupted to be 1,2,3,4. It would no longer be possible to 
>> detect permutations of 4,2,3,1 until cTAKES was restarted. We got the 
>> fix in after the cTAKES 3.2.0 release. 
>> https://issues.apache.org/jira/browse/CTAKES-310
>> Depending upon the corpus size, I could see the permutation engine 
>> eventually only have a single permutation of 1,2,3,4.
>>
>> Typically though, this isn't very easily detected in the first 100 or 
>> so documents.
>>
>> We discovered this issue when we made cTAKES have consistent output of 
>> codes in our system.
>>
>> [IMAT Solutions]<http://imatsolutions.com>
>> Kim Ebert
>> Software Engineer
>> [Office:] 801.669.7342
>> kim.eb...@imatsolutions.com<mailto:greg.hub...@imatsolutions.com>
>> On 12/19/2014 07:05 AM, Savova, Guergana wrote:
>>
>> We are doing a similar kind of evaluation and will report the results.
>>
>> Before we released the Fast lookup, we did a systematic evaluation 
>> across three gold standard sets. We did not see the trend that Bruce 
>> reported below. The P, R and F1 results from the old dictionary look 
>> up and the fast one were similar.
>>
>> Thank you everyone!
>> --Guergana
>>
>> -----Original Message-----
>> From: David Kincaid [mailto:kincaid.d...@gmail.com]
>> Sent: Friday, December 19, 2014 9:02 AM
>> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
>> Subject: Re: cTakes Annotation Comparison
>>
>> Thanks for this, Bruce! Very interesting work. It confirms what I've 
>> seen in my small tests that I've done in a non-systematic way. Did you 
>> happen to capture the number of false positives yet (annotations made 
>> by cTAKES that are not in the human adjudicated standard)? I've seen a 
>> lot of dictionary hits that are not actually entity mentions, but I 
>> haven't had a chance to do a systematic analysis (we're working on our 
>> annotated gold standard now). One great example is the antibiotic 
>> "Today". Every time the word today appears in any text it is annotated 
>> as a medication mention when it almost never is being used in that sense.
>>
>> These results by themselves are quite disappointing to me. Both the 
>> UMLSProcessor and especially the FastUMLSProcessor seem to have pretty 
>> poor recall. It seems like the trade off for more speed is a ten-fold 
>> (or more) decrease in entity recognition.
>>
>> Thanks again for sharing your results with us. I think they are very 
>> useful to the project.
>>
>> - Dave
>>
>> On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen <
>> bruce.tiet...@perfectsearchcorp.com<mailto:
>> bruce.tiet...@perfectsearchcorp.com>> wrote:
>>
>>
>> Actually, we are working on a similar tool to compare it to the human 
>> adjudicated standard for the set we tested against.  I didn't mention 
>> it before because the tool isn't complete yet, but initial results for 
>> the set (excluding those marked as "CUI-less") was as follows:
>>
>> Human adjudicated annotations: 4591 (excluding CUI-less)
>>
>> Annotations found matching the human adjudicated standard
>> UMLSProcessor                  2245
>> FastUMLSProcessor           215
>>
>>
>>
>>
>>
>>
>>  [image: IMAT Solutions] <http://imatsolutions.com>< 
>> http://imatsolutions.com>  Bruce Tietjen Senior Software Engineer
>> [image: Mobile:] 801.634.1547
>> bruce.tiet...@imatsolutions.com<mailto:bruce.tiet...@imatsolutions.com
>> On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei 
>> <pei.c...@childrens.harvard.edu<mailto:pei.c...@childrens.harvard.edu>
>>
>>
>> wrote:
>>
>>
>> Bruce,
>> Thanks for this-- very useful.
>> Perhaps Sean Finan comment more-
>> but it's also probably worth it to compare to an adjudicated human 
>> annotated gold standard.
>>
>> --Pei
>>
>> -----Original Message-----
>> From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
>> Sent: Thursday, December 18, 2014 1:45 PM
>> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
>> Subject: cTakes Annotation Comparison
>>
>> With the recent release of cTakes 3.2.1, we were very interested in 
>> checking for any differences in annotations between using the 
>> AggregatePlaintextUMLSProcessor pipeline and the 
>> AggregatePlanetextFastUMLSProcessor pipeline within this release of
>>
>>
>> cTakes
>>
>>
>> with its associated set of UMLS resources.
>>
>> We chose to use the SHARE 14-a-b Training data that consists of 199 
>> documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the 
>> basis for the comparison.
>>
>> We decided to share a summary of the results with the development 
>> community.
>>
>> Documents Processed: 199
>>
>> Processing Time:
>> UMLSProcessor           2,439 seconds
>> FastUMLSProcessor    1,837 seconds
>>
>> Total Annotations Reported:
>> UMLSProcessor                  20,365 annotations
>> FastUMLSProcessor             8,284 annotations
>>
>>
>> Annotation Comparisons:
>> Annotations common to both sets:                                  3,940
>> Annotations reported only by the UMLSProcessor:         16,425
>> Annotations reported only by the FastUMLSProcessor:    4,344
>>
>>
>> If anyone is interested, following was our test procedure:
>>
>> We used the UIMA CPE to process the document set twice, once using the 
>> AggregatePlaintextUMLSProcessor pipeline and once using the 
>> AggregatePlaintextFastUMLSProcessor pipeline. We used the 
>> WriteCAStoFile CAS consumer to write the results to output files.
>>
>> We used a tool we recently developed to analyze and compare the 
>> annotations generated by the two pipelines. The tool compares the two 
>> outputs for each file and reports any differences in the annotations 
>> (MedicationMention, SignSymptomMention, ProcedureMention, 
>> AnatomicalSiteMention, and
>> DiseaseDisorderMention) between the two output sets. The tool reports 
>> the number of 'matches' and 'misses' between each annotation set.
>> A 'match'
>>
>>
>> is
>>
>>
>> defined as the presence of an identified source text interval with its 
>> associated CUI appearing in both annotation sets. A 'miss' is defined 
>> as the presence of an identified source text interval and its 
>> associated CUI in one annotation set, but no matching identified 
>> source text interval
>>
>>
>> and
>>
>>
>> CUI in the other. The tool also reports the total number of 
>> annotations (source text intervals with associated CUIs) reported in 
>> each annotation set. The compare tool is in our GitHub repository at 
>> https://github.com/perfectsearch/cTAKES-compare
>>
>>
>>
>>
>>

Re: cTakes Annotation Comparison

Reply via email to