RE: cTakes Annotation Comparison

Finan, Sean Fri, 19 Dec 2014 12:38:30 -0800

Hi Bruce,
> Correction -- So far, I did steps 1 and 2 of Sean's email.

No problem.  Aside from recreating the database, those two steps have the 
greatest impact.  But before you change anything else, please do some manual 
spot checks.  I have never seen a case where the lookup would be so horribly 
inaccurate.


Thanks

-----Original Message-----
From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] 
Sent: Friday, December 19, 2014 3:29 PM
To: dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

Correction -- So far, I did steps 1 and 2 of Sean's email.


 [image: IMAT Solutions] <http://imatsolutions.com>  Bruce Tietjen Senior 
Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 1:22 PM, Bruce Tietjen < 
bruce.tiet...@perfectsearchcorp.com> wrote:
>
> Sean,
>
> I tried the configuration changes you mentioned in your earlier email.
>
> The results are as follows:
>
> Total Annotations found: 12,161 (default configuration found 8,284)
>
> If counting exact span matches, this run only matched 211 (default 
> configuration matched 215).
>
> If counting overlapping spans, this run only matched 220 (default 
> configuration matched 224)
>
> Bruce
>
>
>
>  [image: IMAT Solutions] <http://imatsolutions.com>  Bruce Tietjen 
> Senior Software Engineer
> [image: Mobile:] 801.634.1547
> bruce.tiet...@imatsolutions.com
>
> On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei < 
> pei.c...@childrens.harvard.edu> wrote:
>>
>>  Kim,
>>
>> Maintenance is the factor not bugs/issue to forge ahead.
>>
>> They are 2 components that do the same thing with the same goal (As 
>> Sean mentioned, one should be able configure the new code base to  
>> replicate the old algorithm if required- it’s just a simpler and 
>> cleaner code base.  If this is not the case or if there are issues, 
>> we should fix it and move forward.).
>>
>> We can keep the old component around for as long as needed, but it’s 
>> likely going to have limited support…
>>
>> --Pei
>>
>>
>>
>> *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com]
>> *Sent:* Friday, December 19, 2014 1:47 PM
>> *To:* Chen, Pei; dev@ctakes.apache.org
>>
>> *Subject:* Re: cTakes Annotation Comparison
>>
>>
>>
>> Pei,
>>
>> I don't think bugs/issues should be part of determining if one 
>> algorithm vs the other is superior. Obviously, it is worth mentioning 
>> the bugs, but if the fast lookup method has worse precision and 
>> recall but better performance, vs the slower but more accurate first 
>> word lookup algorithm, then time should be invested in fixing those 
>> bugs and resolving those weird issues.
>>
>> Now I'm not saying which one is superior in this case, as the data 
>> will end up speaking for itself one way or the other; bus as of right 
>> now, I'm not convinced yet that the old dictionary lookup is obsolete 
>> yet, and I'm not sure the community is convinced yet either.
>>
>>
>>
>> [image: IMAT Solutions] <http://imatsolutions.com>
>>
>> *Kim Ebert*
>> Software Engineer
>> [image: Office:]801.669.7342
>> kim.eb...@imatsolutions.com <greg.hub...@imatsolutions.com>
>>
>> On 12/19/2014 08:39 AM, Chen, Pei wrote:
>>
>> Also check out stats that Sean ran before releasing the new component on:
>>
>>
>> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup
>> -fast/doc/DictionaryLookupStats.docx
>>
>> From the evaluation and experience, the new lookup algorithm should 
>> be a huge improvement in terms of both speed and accuracy.
>>
>> This is very different than what Bruce mentioned…  I’m sure Sean will 
>> chime here.
>>
>> (The old dictionary lookup is essentially obsolete now- plagued with 
>> bugs/issues as you mentioned.)
>>
>> --Pei
>>
>>
>>
>> *From:* Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com
>> <kim.eb...@perfectsearchcorp.com>]
>> *Sent:* Friday, December 19, 2014 10:25 AM
>> *To:* dev@ctakes.apache.org
>> *Subject:* Re: cTakes Annotation Comparison
>>
>>
>>
>> Guergana,
>>
>> I'm curious to the number of records that are in your gold standard 
>> sets, or if your gold standard set was run through a long running cTAKES 
>> process.
>> I know at some point we fixed a bug in the old dictionary lookup that 
>> caused the permutations to become corrupted over time. Typically this 
>> isn't seen in the first few records, but over time as patterns are 
>> used the permutations would become corrupted. This caused documents 
>> that were fed through cTAKES more than once to have less codes 
>> returned than the first time.
>>
>> For example, if a permutation of 4,2,3,1 was found, the permutation 
>> would be corrupted to be 1,2,3,4. It would no longer be possible to 
>> detect permutations of 4,2,3,1 until cTAKES was restarted. We got the 
>> fix in after the cTAKES 3.2.0 release.
>> https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the 
>> corpus size, I could see the permutation engine eventually only have 
>> a single permutation of 1,2,3,4.
>>
>> Typically though, this isn't very easily detected in the first 100 or 
>> so documents.
>>
>> We discovered this issue when we made cTAKES have consistent output 
>> of codes in our system.
>>
>>
>>
>> [image: IMAT Solutions] <http://imatsolutions.com>
>>
>> *Kim Ebert*
>> Software Engineer
>> [image: Office:]801.669.7342
>> kim.eb...@imatsolutions.com <greg.hub...@imatsolutions.com>
>>
>> On 12/19/2014 07:05 AM, Savova, Guergana wrote:
>>
>> We are doing a similar kind of evaluation and will report the results.
>>
>>
>>
>> Before we released the Fast lookup, we did a systematic evaluation across 
>> three gold standard sets. We did not see the trend that Bruce reported 
>> below. The P, R and F1 results from the old dictionary look up and the fast 
>> one were similar.
>>
>>
>>
>> Thank you everyone!
>>
>> --Guergana
>>
>>
>>
>> -----Original Message-----
>>
>> From: David Kincaid [mailto:kincaid.d...@gmail.com 
>> <kincaid.d...@gmail.com>]
>>
>> Sent: Friday, December 19, 2014 9:02 AM
>>
>> To: dev@ctakes.apache.org
>>
>> Subject: Re: cTakes Annotation Comparison
>>
>>
>>
>> Thanks for this, Bruce! Very interesting work. It confirms what I've seen in 
>> my small tests that I've done in a non-systematic way. Did you happen to 
>> capture the number of false positives yet (annotations made by cTAKES that 
>> are not in the human adjudicated standard)? I've seen a lot of dictionary 
>> hits that are not actually entity mentions, but I haven't had a chance to do 
>> a systematic analysis (we're working on our annotated gold standard now). 
>> One great example is the antibiotic "Today". Every time the word today 
>> appears in any text it is annotated as a medication mention when it almost 
>> never is being used in that sense.
>>
>>
>>
>> These results by themselves are quite disappointing to me. Both the 
>> UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor 
>> recall. It seems like the trade off for more speed is a ten-fold (or more) 
>> decrease in entity recognition.
>>
>>
>>
>> Thanks again for sharing your results with us. I think they are very useful 
>> to the project.
>>
>>
>>
>> - Dave
>>
>>
>>
>> On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen < 
>> bruce.tiet...@perfectsearchcorp.com> wrote:
>>
>>
>>
>> Actually, we are working on a similar tool to compare it to the human
>>
>> adjudicated standard for the set we tested against.  I didn't mention
>>
>> it before because the tool isn't complete yet, but initial results 
>> for
>>
>> the set (excluding those marked as "CUI-less") was as follows:
>>
>>
>>
>> Human adjudicated annotations: 4591 (excluding CUI-less)
>>
>>
>>
>> Annotations found matching the human adjudicated standard
>>
>> UMLSProcessor                  2245
>>
>> FastUMLSProcessor           215
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>  [image: IMAT Solutions] <http://imatsolutions.com> 
>> <http://imatsolutions.com>  Bruce Tietjen
>>
>> Senior Software Engineer
>>
>> [image: Mobile:] 801.634.1547
>>
>> bruce.tiet...@imatsolutions.com
>>
>>
>>
>> On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei
>>
>> <pei.c...@childrens.harvard.edu
>>
>>
>>
>>  wrote:
>>
>>
>>
>> Bruce,
>>
>> Thanks for this-- very useful.
>>
>> Perhaps Sean Finan comment more-
>>
>> but it's also probably worth it to compare to an adjudicated human
>>
>> annotated gold standard.
>>
>>
>>
>> --Pei
>>
>>
>>
>> -----Original Message-----
>>
>> From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com 
>> <bruce.tiet...@perfectsearchcorp.com>]
>>
>> Sent: Thursday, December 18, 2014 1:45 PM
>>
>> To: dev@ctakes.apache.org
>>
>> Subject: cTakes Annotation Comparison
>>
>>
>>
>> With the recent release of cTakes 3.2.1, we were very interested in
>>
>> checking for any differences in annotations between using the
>>
>> AggregatePlaintextUMLSProcessor pipeline and the
>>
>> AggregatePlanetextFastUMLSProcessor pipeline within this release of
>>
>>  cTakes
>>
>>  with its associated set of UMLS resources.
>>
>>
>>
>> We chose to use the SHARE 14-a-b Training data that consists of 199
>>
>> documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the
>>
>> basis for the comparison.
>>
>>
>>
>> We decided to share a summary of the results with the development
>>
>> community.
>>
>>
>>
>> Documents Processed: 199
>>
>>
>>
>> Processing Time:
>>
>> UMLSProcessor           2,439 seconds
>>
>> FastUMLSProcessor    1,837 seconds
>>
>>
>>
>> Total Annotations Reported:
>>
>> UMLSProcessor                  20,365 annotations
>>
>> FastUMLSProcessor             8,284 annotations
>>
>>
>>
>>
>>
>> Annotation Comparisons:
>>
>> Annotations common to both sets:                                  3,940
>>
>> Annotations reported only by the UMLSProcessor:         16,425
>>
>> Annotations reported only by the FastUMLSProcessor:    4,344
>>
>>
>>
>>
>>
>> If anyone is interested, following was our test procedure:
>>
>>
>>
>> We used the UIMA CPE to process the document set twice, once using
>>
>> the AggregatePlaintextUMLSProcessor pipeline and once using the
>>
>> AggregatePlaintextFastUMLSProcessor pipeline. We used the
>>
>> WriteCAStoFile CAS consumer to write the results to output files.
>>
>>
>>
>> We used a tool we recently developed to analyze and compare the
>>
>> annotations generated by the two pipelines. The tool compares the
>>
>> two outputs for each file and reports any differences in the
>>
>> annotations (MedicationMention, SignSymptomMention,
>>
>> ProcedureMention, AnatomicalSiteMention, and
>>
>> DiseaseDisorderMention) between the two output sets. The tool
>>
>> reports the number of 'matches' and 'misses' between each annotation set. A 
>> 'match'
>>
>>  is
>>
>>  defined as the presence of an identified source text interval with
>>
>> its associated CUI appearing in both annotation sets. A 'miss' is
>>
>> defined as the presence of an identified source text interval and
>>
>> its associated CUI in one annotation set, but no matching identified
>>
>> source text interval
>>
>>  and
>>
>>  CUI in the other. The tool also reports the total number of
>>
>> annotations (source text intervals with associated CUIs) reported in
>>
>> each annotation set. The compare tool is in our GitHub repository at
>>
>> https://github.com/perfectsearch/cTAKES-compare
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

RE: cTakes Annotation Comparison

Reply via email to