RE: cTakes Annotation Comparison

Finan, Sean Fri, 19 Dec 2014 11:14:14 -0800

Our human annotators on Share used 2012AB.  I mention it because when I have 
done manual spot-checks between human and system annotations I had 
head-scratchers that ended up being differences in the UMLS version.  I first 
noticed these discrepancies before I had started working on the fast lookup 
(that is to say: when working with the default lookup).

From: Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com]
Sent: Friday, December 19, 2014 1:40 PM
To: dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

Sean,

I don't think that would be an issue since both the rare word lookup and the 
first word lookup are using UMLS 2011AB. Or is the rare word lookup using a 
different dictionary?

I would expect roughly similar results between the two when it comes to 
differences between UMLS versions.

[IMAT Solutions]<http://imatsolutions.com>
Kim Ebert
Software Engineer
[Office:]801.669.7342
kim.eb...@imatsolutions.com<mailto:greg.hub...@imatsolutions.com>
On 12/19/2014 11:31 AM, Finan, Sean wrote:

One quick mention:

The cTakes dictionaries are built with UMLS 2011AB.  If the Human annotations 
were not done using the same UMLS version then there WILL be differences in CUI 
and Semantic group.  I don't have time to go into it with details, examples, 
etc. just be aware that every 6 months cuis are added, removed, deprecated, and 
moved from one TUI to another.

Sean

-----Original Message-----

From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu]

Sent: Friday, December 19, 2014 1:28 PM

To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>

Subject: RE: cTakes Annotation Comparison

Several thoughts:

1. The ShARE corpus annotates only mentions of type Diseases/Disorders and only 
Anatomical Sites associated with a Disease/Disorder. This is by design. cTAKES 
annotates all mentions of types Diseases/Disorders, Signs/Symptoms, Procedures, 
Medications and Anatomical Sites. Therefore you will get MANY more annotations 
with cTAKES. Eventually the ShARe corpus will be expanded to the other types.

2. Keeping (1) in mind, you can approximately estimate the precision/recall/f1 
of cTAKES on the ShARe corpus if you output only mentions of type 
Disease/Disorder.

3. Could you send us the list of files you use from ShARe to test? We have the 
corpus and would like to run against as well.

Hope this makes sense...

--Guergana

-----Original Message-----

From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]

Sent: Friday, December 19, 2014 1:16 PM

To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>

Subject: Re: cTakes Annotation Comparison

Our analysis against the human adjudicated gold standard from this SHARE corpus 
is using a simple check to see if the cTakes output included the annotation 
specified by the gold standard. The initial results I reported were for exact 
matches of CUI and text span.  Only exact matches were counted.

It looks like if we also count as matches cTakes annotations with a matching 
CUI and a text span that overlaps the gold standard text span then the matches 
increase to 224 matching annotations for the FastUMLS pipeline and 2319 for the 
the old pipeline.

The question was also asked about annotations in the cTakes output that were 
not in the human adjudicated gold standard. The answer is yes, there were a lot 
of additional annotations made by cTakes that don't appear to be in the gold 
standard. We haven't analyzed that yet, but it looks like the gold standard we 
are using may only have Disease_Disorder annotations.

 [image: IMAT Solutions] <http://imatsolutions.com><http://imatsolutions.com>  
Bruce Tietjen Senior Software Engineer

[image: Mobile:] 801.634.1547

bruce.tiet...@imatsolutions.com<mailto:bruce.tiet...@imatsolutions.com>

On Fri, Dec 19, 2014 at 9:54 AM, Miller, Timothy < 
timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.harvard.edu>>
 wrote:

Thanks Kim,

This sounds interesting though I don't totally understand it. Are you

saying that extraction performance for a given note depends on which

order the note was in the processing queue? If so that's pretty bad!

If you (or anyone else who understands this issue) has a concrete

example I think that might help me understand what the problem is/was.

Even though, as Pei mentioned, we are going to try moving the

community to the faster dictionary, I would like to understand better

just to help myself avoid issues of this type going forward (and

verify the new dictionary doesn't use similar logic).

Also, when we finish annotating the sample notes, might we use that as

a point of comparison for the two dictionaries? That would get around

the issue that not everyone has access to the datasets we used for

validation and others are likely not able to share theirs either. And

maybe we can replicate the notes if we want to simulate the scenario

Kim is talking about with thousands or more notes.

Tim

On 12/19/2014 10:24 AM, Kim Ebert wrote:

Guergana,

I'm curious to the number of records that are in your gold standard

sets, or if your gold standard set was run through a long running cTAKES 
process.

I know at some point we fixed a bug in the old dictionary lookup that

caused the permutations to become corrupted over time. Typically this

isn't seen in the first few records, but over time as patterns are

used the permutations would become corrupted. This caused documents

that were fed through cTAKES more than once to have less codes

returned than the first time.

For example, if a permutation of 4,2,3,1 was found, the permutation

would be corrupted to be 1,2,3,4. It would no longer be possible to

detect permutations of 4,2,3,1 until cTAKES was restarted. We got the

fix in after the cTAKES 3.2.0 release.

https://issues.apache.org/jira/browse/CTAKES-310

Depending upon the corpus size, I could see the permutation engine

eventually only have a single permutation of 1,2,3,4.

Typically though, this isn't very easily detected in the first 100 or

so documents.

We discovered this issue when we made cTAKES have consistent output of

codes in our system.

[IMAT Solutions]<http://imatsolutions.com><http://imatsolutions.com>

Kim Ebert

Software Engineer

[Office:] 801.669.7342

kim.eb...@imatsolutions.com<mailto:kim.eb...@imatsolutions.com><mailto:greg.hub...@imatsolutions.com><mailto:greg.hub...@imatsolutions.com>

On 12/19/2014 07:05 AM, Savova, Guergana wrote:

We are doing a similar kind of evaluation and will report the results.

Before we released the Fast lookup, we did a systematic evaluation

across three gold standard sets. We did not see the trend that Bruce

reported below. The P, R and F1 results from the old dictionary look

up and the fast one were similar.

Thank you everyone!

--Guergana

-----Original Message-----

From: David Kincaid [mailto:kincaid.d...@gmail.com]

Sent: Friday, December 19, 2014 9:02 AM

To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>

Subject: Re: cTakes Annotation Comparison

Thanks for this, Bruce! Very interesting work. It confirms what I've

seen in my small tests that I've done in a non-systematic way. Did you

happen to capture the number of false positives yet (annotations made

by cTAKES that are not in the human adjudicated standard)? I've seen a

lot of dictionary hits that are not actually entity mentions, but I

haven't had a chance to do a systematic analysis (we're working on our

annotated gold standard now). One great example is the antibiotic

"Today". Every time the word today appears in any text it is annotated

as a medication mention when it almost never is being used in that sense.

These results by themselves are quite disappointing to me. Both the

UMLSProcessor and especially the FastUMLSProcessor seem to have pretty

poor recall. It seems like the trade off for more speed is a ten-fold

(or more) decrease in entity recognition.

Thanks again for sharing your results with us. I think they are very

useful to the project.

- Dave

On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen <

bruce.tiet...@perfectsearchcorp.com<mailto:bruce.tiet...@perfectsearchcorp.com><mailto:<mailto:bruce.tiet...@perfectsearchcorp.com>

bruce.tiet...@perfectsearchcorp.com><mailto:bruce.tiet...@perfectsearchcorp.com>>
 wrote:

Actually, we are working on a similar tool to compare it to the human

adjudicated standard for the set we tested against.  I didn't mention

it before because the tool isn't complete yet, but initial results for

the set (excluding those marked as "CUI-less") was as follows:

Human adjudicated annotations: 4591 (excluding CUI-less)

Annotations found matching the human adjudicated standard

UMLSProcessor                  2245

FastUMLSProcessor           215

 [image: IMAT Solutions] <http://imatsolutions.com><http://imatsolutions.com>< 
<http://imatsolutions.com>

http://imatsolutions.com><http://imatsolutions.com>  Bruce Tietjen Senior 
Software Engineer

[image: Mobile:] 801.634.1547

bruce.tiet...@imatsolutions.com<mailto:bruce.tiet...@imatsolutions.com><mailto:bruce.tiet...@imatsolutions.com

On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei

<pei.c...@childrens.harvard.edu<mailto:pei.c...@childrens.harvard.edu><mailto:pei.c...@childrens.harvard.edu><mailto:pei.c...@childrens.harvard.edu>

wrote:

Bruce,

Thanks for this-- very useful.

Perhaps Sean Finan comment more-

but it's also probably worth it to compare to an adjudicated human

annotated gold standard.

--Pei

-----Original Message-----

From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]

Sent: Thursday, December 18, 2014 1:45 PM

To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>

Subject: cTakes Annotation Comparison

With the recent release of cTakes 3.2.1, we were very interested in

checking for any differences in annotations between using the

AggregatePlaintextUMLSProcessor pipeline and the

AggregatePlanetextFastUMLSProcessor pipeline within this release of

cTakes

with its associated set of UMLS resources.

We chose to use the SHARE 14-a-b Training data that consists of 199

documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the

basis for the comparison.

We decided to share a summary of the results with the development

community.

Documents Processed: 199

Processing Time:

UMLSProcessor           2,439 seconds

FastUMLSProcessor    1,837 seconds

Total Annotations Reported:

UMLSProcessor                  20,365 annotations

FastUMLSProcessor             8,284 annotations

Annotation Comparisons:

Annotations common to both sets:                                  3,940

Annotations reported only by the UMLSProcessor:         16,425

Annotations reported only by the FastUMLSProcessor:    4,344

If anyone is interested, following was our test procedure:

We used the UIMA CPE to process the document set twice, once using the

AggregatePlaintextUMLSProcessor pipeline and once using the

AggregatePlaintextFastUMLSProcessor pipeline. We used the

WriteCAStoFile CAS consumer to write the results to output files.

We used a tool we recently developed to analyze and compare the

annotations generated by the two pipelines. The tool compares the two

outputs for each file and reports any differences in the annotations

(MedicationMention, SignSymptomMention, ProcedureMention,

AnatomicalSiteMention, and

DiseaseDisorderMention) between the two output sets. The tool reports

the number of 'matches' and 'misses' between each annotation set.

A 'match'

is

defined as the presence of an identified source text interval with its

associated CUI appearing in both annotation sets. A 'miss' is defined

as the presence of an identified source text interval and its

associated CUI in one annotation set, but no matching identified

source text interval

and

CUI in the other. The tool also reports the total number of

annotations (source text intervals with associated CUIs) reported in

each annotation set. The compare tool is in our GitHub repository at

https://github.com/perfectsearch/cTAKES-compare

RE: cTakes Annotation Comparison

Reply via email to