Re: cTakes Annotation Comparison

Pei Chen Fri, 19 Dec 2014 14:12:08 -0800

ah! Excellent news... that's much more inline with our experience and
evaluation results.


On Fri, Dec 19, 2014 at 5:04 PM, Bruce Tietjen <
[email protected]> wrote:

> My apologies to Sean and everyone,
>
> I am happy to report that I found a bug in our analysis tools that was
> missing the last FSArray entry for any FSArray list.
>
> With the bug fixed, the results look MUCH better.
>
> UMLSProcessor found 31,598 annotations
> FastUMLSProcessor found 30,716 annotations
>
> There were 23,522 annotations that were exact matches between the two.
>
> When comparing with the gold standard annotations (4591 annotations):
>
> UMLSProcessor found 2632 matches (2,735 including overlaps)
> FastUMLSProcessor found 2795 matches (2,842 including overlaps)
>
>
>
>
>
>
>  [image: IMAT Solutions] <http://imatsolutions.com>
>  Bruce Tietjen
> Senior Software Engineer
> [image: Mobile:] 801.634.1547
> [email protected]
>
> On Fri, Dec 19, 2014 at 1:49 PM, Bruce Tietjen <
> [email protected]> wrote:
> >
> > I'll do that -- there is always a possibility of bugs in the analysis
> > tool.
> >
> >
> >  [image: IMAT Solutions] <http://imatsolutions.com>
> >  Bruce Tietjen
> > Senior Software Engineer
> > [image: Mobile:] 801.634.1547
> > [email protected]
> >
> > On Fri, Dec 19, 2014 at 1:39 PM, Finan, Sean <
> > [email protected]> wrote:
> >>
> >>  Sorry, I meant “Do some spot checks on the validity”.  In other words,
> >> when your script reports that a cui and/or span is missing, manually
> look
> >> at the data and see if it really is.  Just open up one .xmi in the CVD
> and
> >> see what it looks like.
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Sean
> >>
> >>
> >>
> >> *From:* Bruce Tietjen [mailto:[email protected]]
> >> *Sent:* Friday, December 19, 2014 3:37 PM
> >> *To:* [email protected]
> >> *Subject:* Re: cTakes Annotation Comparison
> >>
> >>
> >>
> >> My original results were using a newly downloaded cTakes 3.2.1 with the
> >> separately downloaded resources copied in. There were no changes to any
> of
> >> the configuration files.
> >>
> >> As far as this last run, I modified the UMLSLookupAnnotator.xml and
> >> AggregatePlaintextFastUMLSProcessor.xml.  I've attached the modified
> ones I
> >> used (but they may not get through the mailing list).
> >>
> >>
> >>
> >>
> >>
> >>
> >> [image: Image removed by sender. IMAT Solutions]
> >> <http://imatsolutions.com>
> >>
> >> *Bruce Tietjen*
> >> Senior Software Engineer
> >> [image: Image removed by sender. Mobile:]801.634.1547
> >> [email protected]
> >>
> >>
> >>
> >> On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean <
> >> [email protected]> wrote:
> >>
> >> Hi Bruce,
> >>
> >> I'm not sure how there would be fewer matches with the overlap
> >> processor.  There should be all of the matches from the non-overlap
> >> processor plus those from the overlap.  Decreasing from 215 to 211 is
> >> strange.  Have you done any manual spot checks on this?  It is really
> >> bizarre that you'd only have two matches per document (100 docs?).
> >>
> >> Thanks,
> >> Sean
> >>
> >> -----Original Message-----
> >> From: Bruce Tietjen [mailto:[email protected]]
> >> Sent: Friday, December 19, 2014 3:23 PM
> >> To: [email protected]
> >> Subject: Re: cTakes Annotation Comparison
> >>
> >> Sean,
> >>
> >> I tried the configuration changes you mentioned in your earlier email.
> >>
> >> The results are as follows:
> >>
> >> Total Annotations found: 12,161 (default configuration found 8,284)
> >>
> >> If counting exact span matches, this run only matched 211 (default
> >> configuration matched 215).
> >>
> >> If counting overlapping spans, this run only matched 220 (default
> >> configuration matched 224)
> >>
> >> Bruce
> >>
> >>
> >>
> >>  [image: IMAT Solutions] <http://imatsolutions.com>  Bruce Tietjen
> >> Senior Software Engineer
> >> [image: Mobile:] 801.634.1547
> >> [email protected]
> >>
> >> On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei <
> >> [email protected]>
> >> wrote:
> >> >
> >> >  Kim,
> >> >
> >> > Maintenance is the factor not bugs/issue to forge ahead.
> >> >
> >> > They are 2 components that do the same thing with the same goal (As
> >> > Sean mentioned, one should be able configure the new code base to
> >> > replicate the old algorithm if required- it’s just a simpler and
> >> > cleaner code base.  If this is not the case or if there are issues, we
> >> > should fix it and move forward.).
> >> >
> >> > We can keep the old component around for as long as needed, but it’s
> >> > likely going to have limited support…
> >> >
> >> > --Pei
> >> >
> >> >
> >> >
> >> > *From:* Kim Ebert [mailto:[email protected]]
> >> > *Sent:* Friday, December 19, 2014 1:47 PM
> >> > *To:* Chen, Pei; [email protected]
> >> >
> >> > *Subject:* Re: cTakes Annotation Comparison
> >> >
> >> >
> >> >
> >> > Pei,
> >> >
> >> > I don't think bugs/issues should be part of determining if one
> >> > algorithm vs the other is superior. Obviously, it is worth mentioning
> >> > the bugs, but if the fast lookup method has worse precision and recall
> >> > but better performance, vs the slower but more accurate first word
> >> > lookup algorithm, then time should be invested in fixing those bugs
> >> > and resolving those weird issues.
> >> >
> >> > Now I'm not saying which one is superior in this case, as the data
> >> > will end up speaking for itself one way or the other; bus as of right
> >> > now, I'm not convinced yet that the old dictionary lookup is obsolete
> >> > yet, and I'm not sure the community is convinced yet either.
> >> >
> >> >
> >> >
> >> > [image: IMAT Solutions] <http://imatsolutions.com>
> >> >
> >> > *Kim Ebert*
> >> > Software Engineer
> >> > [image: Office:]801.669.7342
> >> > [email protected] <[email protected]>
> >> >
> >> > On 12/19/2014 08:39 AM, Chen, Pei wrote:
> >> >
> >> > Also check out stats that Sean ran before releasing the new component
> >> on:
> >> >
> >> >
> >> >
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-
> >> > fast/doc/DictionaryLookupStats.docx
> >> >
> >> > From the evaluation and experience, the new lookup algorithm should be
> >> > a huge improvement in terms of both speed and accuracy.
> >> >
> >> > This is very different than what Bruce mentioned…  I’m sure Sean will
> >> > chime here.
> >> >
> >> > (The old dictionary lookup is essentially obsolete now- plagued with
> >> > bugs/issues as you mentioned.)
> >> >
> >> > --Pei
> >> >
> >> >
> >> >
> >> > *From:* Kim Ebert [mailto:[email protected]
> >> > <[email protected]>]
> >> > *Sent:* Friday, December 19, 2014 10:25 AM
> >> > *To:* [email protected]
> >> > *Subject:* Re: cTakes Annotation Comparison
> >> >
> >> >
> >> >
> >> > Guergana,
> >> >
> >> > I'm curious to the number of records that are in your gold standard
> >> > sets, or if your gold standard set was run through a long running
> >> cTAKES process.
> >> > I know at some point we fixed a bug in the old dictionary lookup that
> >> > caused the permutations to become corrupted over time. Typically this
> >> > isn't seen in the first few records, but over time as patterns are
> >> > used the permutations would become corrupted. This caused documents
> >> > that were fed through cTAKES more than once to have less codes
> >> > returned than the first time.
> >> >
> >> > For example, if a permutation of 4,2,3,1 was found, the permutation
> >> > would be corrupted to be 1,2,3,4. It would no longer be possible to
> >> > detect permutations of 4,2,3,1 until cTAKES was restarted. We got the
> >> > fix in after the cTAKES 3.2.0 release.
> >> > https://issues.apache.org/jira/browse/CTAKES-310
> >> > Depending upon the corpus size, I could see the permutation engine
> >> > eventually only have a single permutation of 1,2,3,4.
> >> >
> >> > Typically though, this isn't very easily detected in the first 100 or
> >> > so documents.
> >> >
> >> > We discovered this issue when we made cTAKES have consistent output of
> >> > codes in our system.
> >> >
> >> >
> >> >
> >> > [image: IMAT Solutions] <http://imatsolutions.com>
> >> >
> >> > *Kim Ebert*
> >> > Software Engineer
> >> > [image: Office:]801.669.7342
> >> > [email protected] <[email protected]>
> >>
> >> >
> >> > On 12/19/2014 07:05 AM, Savova, Guergana wrote:
> >> >
> >> > We are doing a similar kind of evaluation and will report the results.
> >> >
> >> >
> >> >
> >> > Before we released the Fast lookup, we did a systematic evaluation
> >> across three gold standard sets. We did not see the trend that Bruce
> >> reported below. The P, R and F1 results from the old dictionary look up
> and
> >> the fast one were similar.
> >> >
> >> >
> >> >
> >> > Thank you everyone!
> >> >
> >> > --Guergana
> >> >
> >> >
> >> >
> >> > -----Original Message-----
> >> >
> >> > From: David Kincaid [mailto:[email protected]
> >> > <[email protected]>]
> >> >
> >> > Sent: Friday, December 19, 2014 9:02 AM
> >> >
> >> > To: [email protected]
> >> >
> >> > Subject: Re: cTakes Annotation Comparison
> >> >
> >> >
> >> >
> >> > Thanks for this, Bruce! Very interesting work. It confirms what I've
> >> seen in my small tests that I've done in a non-systematic way. Did you
> >> happen to capture the number of false positives yet (annotations made by
> >> cTAKES that are not in the human adjudicated standard)? I've seen a lot
> of
> >> dictionary hits that are not actually entity mentions, but I haven't
> had a
> >> chance to do a systematic analysis (we're working on our annotated gold
> >> standard now). One great example is the antibiotic "Today". Every time
> the
> >> word today appears in any text it is annotated as a medication mention
> when
> >> it almost never is being used in that sense.
> >> >
> >> >
> >> >
> >> > These results by themselves are quite disappointing to me. Both the
> >> UMLSProcessor and especially the FastUMLSProcessor seem to have pretty
> poor
> >> recall. It seems like the trade off for more speed is a ten-fold (or
> more)
> >> decrease in entity recognition.
> >> >
> >> >
> >> >
> >> > Thanks again for sharing your results with us. I think they are very
> >> useful to the project.
> >> >
> >> >
> >> >
> >> > - Dave
> >> >
> >> >
> >> >
> >> > On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen <
> >> [email protected]> wrote:
> >> >
> >> >
> >> >
> >> > Actually, we are working on a similar tool to compare it to the human
> >> >
> >> > adjudicated standard for the set we tested against.  I didn't mention
> >> >
> >> > it before because the tool isn't complete yet, but initial results for
> >> >
> >> > the set (excluding those marked as "CUI-less") was as follows:
> >> >
> >> >
> >> >
> >> > Human adjudicated annotations: 4591 (excluding CUI-less)
> >> >
> >> >
> >> >
> >> > Annotations found matching the human adjudicated standard
> >> >
> >> > UMLSProcessor                  2245
> >> >
> >> > FastUMLSProcessor           215
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >  [image: IMAT Solutions] <http://imatsolutions.com>
> >> > <http://imatsolutions.com>  Bruce Tietjen
> >> >
> >> > Senior Software Engineer
> >> >
> >> > [image: Mobile:] 801.634.1547
> >> >
> >> > [email protected]
> >> >
> >> >
> >> >
> >> > On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei
> >> >
> >> > <[email protected]
> >> >
> >> >
> >> >
> >> >  wrote:
> >> >
> >> >
> >> >
> >> > Bruce,
> >> >
> >> > Thanks for this-- very useful.
> >> >
> >> > Perhaps Sean Finan comment more-
> >> >
> >> > but it's also probably worth it to compare to an adjudicated human
> >> >
> >> > annotated gold standard.
> >> >
> >> >
> >> >
> >> > --Pei
> >> >
> >> >
> >> >
> >> > -----Original Message-----
> >> >
> >> > From: Bruce Tietjen [mailto:[email protected]
> >> > <[email protected]>]
> >> >
> >> > Sent: Thursday, December 18, 2014 1:45 PM
> >> >
> >> > To: [email protected]
> >> >
> >> > Subject: cTakes Annotation Comparison
> >> >
> >> >
> >> >
> >> > With the recent release of cTakes 3.2.1, we were very interested in
> >> >
> >> > checking for any differences in annotations between using the
> >> >
> >> > AggregatePlaintextUMLSProcessor pipeline and the
> >> >
> >> > AggregatePlanetextFastUMLSProcessor pipeline within this release of
> >> >
> >> >  cTakes
> >> >
> >> >  with its associated set of UMLS resources.
> >> >
> >> >
> >> >
> >> > We chose to use the SHARE 14-a-b Training data that consists of 199
> >> >
> >> > documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the
> >> >
> >> > basis for the comparison.
> >> >
> >> >
> >> >
> >> > We decided to share a summary of the results with the development
> >> >
> >> > community.
> >> >
> >> >
> >> >
> >> > Documents Processed: 199
> >> >
> >> >
> >> >
> >> > Processing Time:
> >> >
> >> > UMLSProcessor           2,439 seconds
> >> >
> >> > FastUMLSProcessor    1,837 seconds
> >> >
> >> >
> >> >
> >> > Total Annotations Reported:
> >> >
> >> > UMLSProcessor                  20,365 annotations
> >> >
> >> > FastUMLSProcessor             8,284 annotations
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Annotation Comparisons:
> >> >
> >> > Annotations common to both sets:
> 3,940
> >> >
> >> > Annotations reported only by the UMLSProcessor:         16,425
> >> >
> >> > Annotations reported only by the FastUMLSProcessor:    4,344
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > If anyone is interested, following was our test procedure:
> >> >
> >> >
> >> >
> >> > We used the UIMA CPE to process the document set twice, once using
> >> >
> >> > the AggregatePlaintextUMLSProcessor pipeline and once using the
> >> >
> >> > AggregatePlaintextFastUMLSProcessor pipeline. We used the
> >> >
> >> > WriteCAStoFile CAS consumer to write the results to output files.
> >> >
> >> >
> >> >
> >> > We used a tool we recently developed to analyze and compare the
> >> >
> >> > annotations generated by the two pipelines. The tool compares the
> >> >
> >> > two outputs for each file and reports any differences in the
> >> >
> >> > annotations (MedicationMention, SignSymptomMention,
> >> >
> >> > ProcedureMention, AnatomicalSiteMention, and
> >> >
> >> > DiseaseDisorderMention) between the two output sets. The tool
> >> >
> >> > reports the number of 'matches' and 'misses' between each annotation
> >> set. A 'match'
> >> >
> >> >  is
> >> >
> >> >  defined as the presence of an identified source text interval with
> >> >
> >> > its associated CUI appearing in both annotation sets. A 'miss' is
> >> >
> >> > defined as the presence of an identified source text interval and
> >> >
> >> > its associated CUI in one annotation set, but no matching identified
> >> >
> >> > source text interval
> >> >
> >> >  and
> >> >
> >> >  CUI in the other. The tool also reports the total number of
> >> >
> >> > annotations (source text intervals with associated CUIs) reported in
> >> >
> >> > each annotation set. The compare tool is in our GitHub repository at
> >> >
> >> > https://github.com/perfectsearch/cTAKES-compare
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >
>

Re: cTakes Annotation Comparison

Reply via email to