Re: cTakes Annotation Comparison

Bruce Tietjen Fri, 19 Dec 2014 12:51:00 -0800

I'll do that -- there is always a possibility of bugs in the analysis tool.




 [image: IMAT Solutions] <http://imatsolutions.com>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
[email protected]

On Fri, Dec 19, 2014 at 1:39 PM, Finan, Sean <
[email protected]> wrote:
>
>  Sorry, I meant “Do some spot checks on the validity”.  In other words,
> when your script reports that a cui and/or span is missing, manually look
> at the data and see if it really is.  Just open up one .xmi in the CVD and
> see what it looks like.
>
>
>
> Thanks,
>
> Sean
>
>
>
> *From:* Bruce Tietjen [mailto:[email protected]]
> *Sent:* Friday, December 19, 2014 3:37 PM
> *To:* [email protected]
> *Subject:* Re: cTakes Annotation Comparison
>
>
>
> My original results were using a newly downloaded cTakes 3.2.1 with the
> separately downloaded resources copied in. There were no changes to any of
> the configuration files.
>
> As far as this last run, I modified the UMLSLookupAnnotator.xml and
> AggregatePlaintextFastUMLSProcessor.xml.  I've attached the modified ones I
> used (but they may not get through the mailing list).
>
>
>
>
>
>
> [image: Image removed by sender. IMAT Solutions]
> <http://imatsolutions.com>
>
> *Bruce Tietjen*
> Senior Software Engineer
> [image: Image removed by sender. Mobile:]801.634.1547
> [email protected]
>
>
>
> On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean <
> [email protected]> wrote:
>
> Hi Bruce,
>
> I'm not sure how there would be fewer matches with the overlap processor.
> There should be all of the matches from the non-overlap processor plus
> those from the overlap.  Decreasing from 215 to 211 is strange.  Have you
> done any manual spot checks on this?  It is really bizarre that you'd only
> have two matches per document (100 docs?).
>
> Thanks,
> Sean
>
> -----Original Message-----
> From: Bruce Tietjen [mailto:[email protected]]
> Sent: Friday, December 19, 2014 3:23 PM
> To: [email protected]
> Subject: Re: cTakes Annotation Comparison
>
> Sean,
>
> I tried the configuration changes you mentioned in your earlier email.
>
> The results are as follows:
>
> Total Annotations found: 12,161 (default configuration found 8,284)
>
> If counting exact span matches, this run only matched 211 (default
> configuration matched 215).
>
> If counting overlapping spans, this run only matched 220 (default
> configuration matched 224)
>
> Bruce
>
>
>
>  [image: IMAT Solutions] <http://imatsolutions.com>  Bruce Tietjen Senior
> Software Engineer
> [image: Mobile:] 801.634.1547
> [email protected]
>
> On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei <
> [email protected]>
> wrote:
> >
> >  Kim,
> >
> > Maintenance is the factor not bugs/issue to forge ahead.
> >
> > They are 2 components that do the same thing with the same goal (As
> > Sean mentioned, one should be able configure the new code base to
> > replicate the old algorithm if required- it’s just a simpler and
> > cleaner code base.  If this is not the case or if there are issues, we
> > should fix it and move forward.).
> >
> > We can keep the old component around for as long as needed, but it’s
> > likely going to have limited support…
> >
> > --Pei
> >
> >
> >
> > *From:* Kim Ebert [mailto:[email protected]]
> > *Sent:* Friday, December 19, 2014 1:47 PM
> > *To:* Chen, Pei; [email protected]
> >
> > *Subject:* Re: cTakes Annotation Comparison
> >
> >
> >
> > Pei,
> >
> > I don't think bugs/issues should be part of determining if one
> > algorithm vs the other is superior. Obviously, it is worth mentioning
> > the bugs, but if the fast lookup method has worse precision and recall
> > but better performance, vs the slower but more accurate first word
> > lookup algorithm, then time should be invested in fixing those bugs
> > and resolving those weird issues.
> >
> > Now I'm not saying which one is superior in this case, as the data
> > will end up speaking for itself one way or the other; bus as of right
> > now, I'm not convinced yet that the old dictionary lookup is obsolete
> > yet, and I'm not sure the community is convinced yet either.
> >
> >
> >
> > [image: IMAT Solutions] <http://imatsolutions.com>
> >
> > *Kim Ebert*
> > Software Engineer
> > [image: Office:]801.669.7342
> > [email protected] <[email protected]>
> >
> > On 12/19/2014 08:39 AM, Chen, Pei wrote:
> >
> > Also check out stats that Sean ran before releasing the new component on:
> >
> >
> > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-
> > fast/doc/DictionaryLookupStats.docx
> >
> > From the evaluation and experience, the new lookup algorithm should be
> > a huge improvement in terms of both speed and accuracy.
> >
> > This is very different than what Bruce mentioned…  I’m sure Sean will
> > chime here.
> >
> > (The old dictionary lookup is essentially obsolete now- plagued with
> > bugs/issues as you mentioned.)
> >
> > --Pei
> >
> >
> >
> > *From:* Kim Ebert [mailto:[email protected]
> > <[email protected]>]
> > *Sent:* Friday, December 19, 2014 10:25 AM
> > *To:* [email protected]
> > *Subject:* Re: cTakes Annotation Comparison
> >
> >
> >
> > Guergana,
> >
> > I'm curious to the number of records that are in your gold standard
> > sets, or if your gold standard set was run through a long running cTAKES
> process.
> > I know at some point we fixed a bug in the old dictionary lookup that
> > caused the permutations to become corrupted over time. Typically this
> > isn't seen in the first few records, but over time as patterns are
> > used the permutations would become corrupted. This caused documents
> > that were fed through cTAKES more than once to have less codes
> > returned than the first time.
> >
> > For example, if a permutation of 4,2,3,1 was found, the permutation
> > would be corrupted to be 1,2,3,4. It would no longer be possible to
> > detect permutations of 4,2,3,1 until cTAKES was restarted. We got the
> > fix in after the cTAKES 3.2.0 release.
> > https://issues.apache.org/jira/browse/CTAKES-310
> > Depending upon the corpus size, I could see the permutation engine
> > eventually only have a single permutation of 1,2,3,4.
> >
> > Typically though, this isn't very easily detected in the first 100 or
> > so documents.
> >
> > We discovered this issue when we made cTAKES have consistent output of
> > codes in our system.
> >
> >
> >
> > [image: IMAT Solutions] <http://imatsolutions.com>
> >
> > *Kim Ebert*
> > Software Engineer
> > [image: Office:]801.669.7342
> > [email protected] <[email protected]>
>
> >
> > On 12/19/2014 07:05 AM, Savova, Guergana wrote:
> >
> > We are doing a similar kind of evaluation and will report the results.
> >
> >
> >
> > Before we released the Fast lookup, we did a systematic evaluation
> across three gold standard sets. We did not see the trend that Bruce
> reported below. The P, R and F1 results from the old dictionary look up and
> the fast one were similar.
> >
> >
> >
> > Thank you everyone!
> >
> > --Guergana
> >
> >
> >
> > -----Original Message-----
> >
> > From: David Kincaid [mailto:[email protected]
> > <[email protected]>]
> >
> > Sent: Friday, December 19, 2014 9:02 AM
> >
> > To: [email protected]
> >
> > Subject: Re: cTakes Annotation Comparison
> >
> >
> >
> > Thanks for this, Bruce! Very interesting work. It confirms what I've
> seen in my small tests that I've done in a non-systematic way. Did you
> happen to capture the number of false positives yet (annotations made by
> cTAKES that are not in the human adjudicated standard)? I've seen a lot of
> dictionary hits that are not actually entity mentions, but I haven't had a
> chance to do a systematic analysis (we're working on our annotated gold
> standard now). One great example is the antibiotic "Today". Every time the
> word today appears in any text it is annotated as a medication mention when
> it almost never is being used in that sense.
> >
> >
> >
> > These results by themselves are quite disappointing to me. Both the
> UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor
> recall. It seems like the trade off for more speed is a ten-fold (or more)
> decrease in entity recognition.
> >
> >
> >
> > Thanks again for sharing your results with us. I think they are very
> useful to the project.
> >
> >
> >
> > - Dave
> >
> >
> >
> > On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen <
> [email protected]> wrote:
> >
> >
> >
> > Actually, we are working on a similar tool to compare it to the human
> >
> > adjudicated standard for the set we tested against.  I didn't mention
> >
> > it before because the tool isn't complete yet, but initial results for
> >
> > the set (excluding those marked as "CUI-less") was as follows:
> >
> >
> >
> > Human adjudicated annotations: 4591 (excluding CUI-less)
> >
> >
> >
> > Annotations found matching the human adjudicated standard
> >
> > UMLSProcessor                  2245
> >
> > FastUMLSProcessor           215
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >  [image: IMAT Solutions] <http://imatsolutions.com>
> > <http://imatsolutions.com>  Bruce Tietjen
> >
> > Senior Software Engineer
> >
> > [image: Mobile:] 801.634.1547
> >
> > [email protected]
> >
> >
> >
> > On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei
> >
> > <[email protected]
> >
> >
> >
> >  wrote:
> >
> >
> >
> > Bruce,
> >
> > Thanks for this-- very useful.
> >
> > Perhaps Sean Finan comment more-
> >
> > but it's also probably worth it to compare to an adjudicated human
> >
> > annotated gold standard.
> >
> >
> >
> > --Pei
> >
> >
> >
> > -----Original Message-----
> >
> > From: Bruce Tietjen [mailto:[email protected]
> > <[email protected]>]
> >
> > Sent: Thursday, December 18, 2014 1:45 PM
> >
> > To: [email protected]
> >
> > Subject: cTakes Annotation Comparison
> >
> >
> >
> > With the recent release of cTakes 3.2.1, we were very interested in
> >
> > checking for any differences in annotations between using the
> >
> > AggregatePlaintextUMLSProcessor pipeline and the
> >
> > AggregatePlanetextFastUMLSProcessor pipeline within this release of
> >
> >  cTakes
> >
> >  with its associated set of UMLS resources.
> >
> >
> >
> > We chose to use the SHARE 14-a-b Training data that consists of 199
> >
> > documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the
> >
> > basis for the comparison.
> >
> >
> >
> > We decided to share a summary of the results with the development
> >
> > community.
> >
> >
> >
> > Documents Processed: 199
> >
> >
> >
> > Processing Time:
> >
> > UMLSProcessor           2,439 seconds
> >
> > FastUMLSProcessor    1,837 seconds
> >
> >
> >
> > Total Annotations Reported:
> >
> > UMLSProcessor                  20,365 annotations
> >
> > FastUMLSProcessor             8,284 annotations
> >
> >
> >
> >
> >
> > Annotation Comparisons:
> >
> > Annotations common to both sets:                                  3,940
> >
> > Annotations reported only by the UMLSProcessor:         16,425
> >
> > Annotations reported only by the FastUMLSProcessor:    4,344
> >
> >
> >
> >
> >
> > If anyone is interested, following was our test procedure:
> >
> >
> >
> > We used the UIMA CPE to process the document set twice, once using
> >
> > the AggregatePlaintextUMLSProcessor pipeline and once using the
> >
> > AggregatePlaintextFastUMLSProcessor pipeline. We used the
> >
> > WriteCAStoFile CAS consumer to write the results to output files.
> >
> >
> >
> > We used a tool we recently developed to analyze and compare the
> >
> > annotations generated by the two pipelines. The tool compares the
> >
> > two outputs for each file and reports any differences in the
> >
> > annotations (MedicationMention, SignSymptomMention,
> >
> > ProcedureMention, AnatomicalSiteMention, and
> >
> > DiseaseDisorderMention) between the two output sets. The tool
> >
> > reports the number of 'matches' and 'misses' between each annotation
> set. A 'match'
> >
> >  is
> >
> >  defined as the presence of an identified source text interval with
> >
> > its associated CUI appearing in both annotation sets. A 'miss' is
> >
> > defined as the presence of an identified source text interval and
> >
> > its associated CUI in one annotation set, but no matching identified
> >
> > source text interval
> >
> >  and
> >
> >  CUI in the other. The tool also reports the total number of
> >
> > annotations (source text intervals with associated CUIs) reported in
> >
> > each annotation set. The compare tool is in our GitHub repository at
> >
> > https://github.com/perfectsearch/cTAKES-compare
> >
> >
> >
> >
> >
> >
> >
> >
> >
>

Re: cTakes Annotation Comparison

Reply via email to