Re: cTakes Annotation Comparison

Miller, Timothy Fri, 19 Dec 2014 08:56:03 -0800

Thanks Kim,
This sounds interesting though I don't totally understand it. Are you saying 
that extraction performance for a given note depends on which order the note 
was in the processing queue? If so that's pretty bad! If you (or anyone else 
who understands this issue) has a concrete example I think that might help me 
understand what the problem is/was.


Even though, as Pei mentioned, we are going to try moving the community to the 
faster dictionary, I would like to understand better just to help myself avoid 
issues of this type going forward (and verify the new dictionary doesn't use 
similar logic).

Also, when we finish annotating the sample notes, might we use that as a point 
of comparison for the two dictionaries? That would get around the issue that 
not everyone has access to the datasets we used for validation and others are 
likely not able to share theirs either. And maybe we can replicate the notes if 
we want to simulate the scenario Kim is talking about with thousands or more 
notes.

Tim


On 12/19/2014 10:24 AM, Kim Ebert wrote:
Guergana,

I'm curious to the number of records that are in your gold standard sets, or if 
your gold standard set was run through a long running cTAKES process. I know at 
some point we fixed a bug in the old dictionary lookup that caused the 
permutations to become corrupted over time. Typically this isn't seen in the 
first few records, but over time as patterns are used the permutations would 
become corrupted. This caused documents that were fed through cTAKES more than 
once to have less codes returned than the first time.

For example, if a permutation of 4,2,3,1 was found, the permutation would be 
corrupted to be 1,2,3,4. It would no longer be possible to detect permutations 
of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 
release. https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the 
corpus size, I could see the permutation engine eventually only have a single 
permutation of 1,2,3,4.

Typically though, this isn't very easily detected in the first 100 or so 
documents.

We discovered this issue when we made cTAKES have consistent output of codes in 
our system.

[IMAT Solutions]<http://imatsolutions.com>
Kim Ebert
Software Engineer
[Office:] 801.669.7342
kim.eb...@imatsolutions.com<mailto:greg.hub...@imatsolutions.com>
On 12/19/2014 07:05 AM, Savova, Guergana wrote:

We are doing a similar kind of evaluation and will report the results.

Before we released the Fast lookup, we did a systematic evaluation across three 
gold standard sets. We did not see the trend that Bruce reported below. The P, 
R and F1 results from the old dictionary look up and the fast one were similar.

Thank you everyone!
--Guergana

-----Original Message-----
From: David Kincaid [mailto:kincaid.d...@gmail.com]
Sent: Friday, December 19, 2014 9:02 AM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: Re: cTakes Annotation Comparison

Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my 
small tests that I've done in a non-systematic way. Did you happen to capture 
the number of false positives yet (annotations made by cTAKES that are not in 
the human adjudicated standard)? I've seen a lot of dictionary hits that are 
not actually entity mentions, but I haven't had a chance to do a systematic 
analysis (we're working on our annotated gold standard now). One great example 
is the antibiotic "Today". Every time the word today appears in any text it is 
annotated as a medication mention when it almost never is being used in that 
sense.

These results by themselves are quite disappointing to me. Both the 
UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor 
recall. It seems like the trade off for more speed is a ten-fold (or more) 
decrease in entity recognition.

Thanks again for sharing your results with us. I think they are very useful to 
the project.

- Dave

On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen < 
bruce.tiet...@perfectsearchcorp.com<mailto:bruce.tiet...@perfectsearchcorp.com>>
 wrote:


Actually, we are working on a similar tool to compare it to the human
adjudicated standard for the set we tested against.  I didn't mention
it before because the tool isn't complete yet, but initial results for
the set (excluding those marked as "CUI-less") was as follows:

Human adjudicated annotations: 4591 (excluding CUI-less)

Annotations found matching the human adjudicated standard
UMLSProcessor                  2245
FastUMLSProcessor           215






 [image: IMAT Solutions] <http://imatsolutions.com><http://imatsolutions.com>  
Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com<mailto:bruce.tiet...@imatsolutions.com>

On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei
<pei.c...@childrens.harvard.edu<mailto:pei.c...@childrens.harvard.edu>


wrote:


Bruce,
Thanks for this-- very useful.
Perhaps Sean Finan comment more-
but it's also probably worth it to compare to an adjudicated human
annotated gold standard.

--Pei

-----Original Message-----
From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
Sent: Thursday, December 18, 2014 1:45 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: cTakes Annotation Comparison

With the recent release of cTakes 3.2.1, we were very interested in
checking for any differences in annotations between using the
AggregatePlaintextUMLSProcessor pipeline and the
AggregatePlanetextFastUMLSProcessor pipeline within this release of


cTakes


with its associated set of UMLS resources.

We chose to use the SHARE 14-a-b Training data that consists of 199
documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the
basis for the comparison.

We decided to share a summary of the results with the development
community.

Documents Processed: 199

Processing Time:
UMLSProcessor           2,439 seconds
FastUMLSProcessor    1,837 seconds

Total Annotations Reported:
UMLSProcessor                  20,365 annotations
FastUMLSProcessor             8,284 annotations


Annotation Comparisons:
Annotations common to both sets:                                  3,940
Annotations reported only by the UMLSProcessor:         16,425
Annotations reported only by the FastUMLSProcessor:    4,344


If anyone is interested, following was our test procedure:

We used the UIMA CPE to process the document set twice, once using
the AggregatePlaintextUMLSProcessor pipeline and once using the
AggregatePlaintextFastUMLSProcessor pipeline. We used the
WriteCAStoFile CAS consumer to write the results to output files.

We used a tool we recently developed to analyze and compare the
annotations generated by the two pipelines. The tool compares the
two outputs for each file and reports any differences in the
annotations (MedicationMention, SignSymptomMention,
ProcedureMention, AnatomicalSiteMention, and
DiseaseDisorderMention) between the two output sets. The tool
reports the number of 'matches' and 'misses' between each annotation set. A 
'match'


is


defined as the presence of an identified source text interval with
its associated CUI appearing in both annotation sets. A 'miss' is
defined as the presence of an identified source text interval and
its associated CUI in one annotation set, but no matching identified
source text interval


and


CUI in the other. The tool also reports the total number of
annotations (source text intervals with associated CUIs) reported in
each annotation set. The compare tool is in our GitHub repository at
https://github.com/perfectsearch/cTAKES-compare

Re: cTakes Annotation Comparison

Reply via email to