Bruce, I think we all feel a lot better now. I think the tool will be helpful moving forward.
I've updated the git repo with the fix in case anyone is interested. IMAT Solutions <http://imatsolutions.com> Kim Ebert Software Engineer Office: 801.669.7342 [email protected] <mailto:[email protected]> On 12/19/2014 03:04 PM, Bruce Tietjen wrote: > My apologies to Sean and everyone, > > I am happy to report that I found a bug in our analysis tools that was > missing the last FSArray entry for any FSArray list. > > With the bug fixed, the results look MUCH better. > > UMLSProcessor found 31,598 annotations > FastUMLSProcessor found 30,716 annotations > > There were 23,522 annotations that were exact matches between the two. > > When comparing with the gold standard annotations (4591 annotations): > > UMLSProcessor found 2632 matches (2,735 including overlaps) > FastUMLSProcessor found 2795 matches (2,842 including overlaps) > > > > > > > [image: IMAT Solutions] <http://imatsolutions.com> > Bruce Tietjen > Senior Software Engineer > [image: Mobile:] 801.634.1547 > [email protected] > > On Fri, Dec 19, 2014 at 1:49 PM, Bruce Tietjen < > [email protected]> wrote: >> I'll do that -- there is always a possibility of bugs in the analysis >> tool. >> >> >> [image: IMAT Solutions] <http://imatsolutions.com> >> Bruce Tietjen >> Senior Software Engineer >> [image: Mobile:] 801.634.1547 >> [email protected] >> >> On Fri, Dec 19, 2014 at 1:39 PM, Finan, Sean < >> [email protected]> wrote: >>> Sorry, I meant “Do some spot checks on the validity”. In other words, >>> when your script reports that a cui and/or span is missing, manually look >>> at the data and see if it really is. Just open up one .xmi in the CVD and >>> see what it looks like. >>> >>> >>> >>> Thanks, >>> >>> Sean >>> >>> >>> >>> *From:* Bruce Tietjen [mailto:[email protected]] >>> *Sent:* Friday, December 19, 2014 3:37 PM >>> *To:* [email protected] >>> *Subject:* Re: cTakes Annotation Comparison >>> >>> >>> >>> My original results were using a newly downloaded cTakes 3.2.1 with the >>> separately downloaded resources copied in. There were no changes to any of >>> the configuration files. >>> >>> As far as this last run, I modified the UMLSLookupAnnotator.xml and >>> AggregatePlaintextFastUMLSProcessor.xml. I've attached the modified ones I >>> used (but they may not get through the mailing list). >>> >>> >>> >>> >>> >>> >>> [image: Image removed by sender. IMAT Solutions] >>> <http://imatsolutions.com> >>> >>> *Bruce Tietjen* >>> Senior Software Engineer >>> [image: Image removed by sender. Mobile:]801.634.1547 >>> [email protected] >>> >>> >>> >>> On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean < >>> [email protected]> wrote: >>> >>> Hi Bruce, >>> >>> I'm not sure how there would be fewer matches with the overlap >>> processor. There should be all of the matches from the non-overlap >>> processor plus those from the overlap. Decreasing from 215 to 211 is >>> strange. Have you done any manual spot checks on this? It is really >>> bizarre that you'd only have two matches per document (100 docs?). >>> >>> Thanks, >>> Sean >>> >>> -----Original Message----- >>> From: Bruce Tietjen [mailto:[email protected]] >>> Sent: Friday, December 19, 2014 3:23 PM >>> To: [email protected] >>> Subject: Re: cTakes Annotation Comparison >>> >>> Sean, >>> >>> I tried the configuration changes you mentioned in your earlier email. >>> >>> The results are as follows: >>> >>> Total Annotations found: 12,161 (default configuration found 8,284) >>> >>> If counting exact span matches, this run only matched 211 (default >>> configuration matched 215). >>> >>> If counting overlapping spans, this run only matched 220 (default >>> configuration matched 224) >>> >>> Bruce >>> >>> >>> >>> [image: IMAT Solutions] <http://imatsolutions.com> Bruce Tietjen >>> Senior Software Engineer >>> [image: Mobile:] 801.634.1547 >>> [email protected] >>> >>> On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei < >>> [email protected]> >>> wrote: >>>> Kim, >>>> >>>> Maintenance is the factor not bugs/issue to forge ahead. >>>> >>>> They are 2 components that do the same thing with the same goal (As >>>> Sean mentioned, one should be able configure the new code base to >>>> replicate the old algorithm if required- it’s just a simpler and >>>> cleaner code base. If this is not the case or if there are issues, we >>>> should fix it and move forward.). >>>> >>>> We can keep the old component around for as long as needed, but it’s >>>> likely going to have limited support… >>>> >>>> --Pei >>>> >>>> >>>> >>>> *From:* Kim Ebert [mailto:[email protected]] >>>> *Sent:* Friday, December 19, 2014 1:47 PM >>>> *To:* Chen, Pei; [email protected] >>>> >>>> *Subject:* Re: cTakes Annotation Comparison >>>> >>>> >>>> >>>> Pei, >>>> >>>> I don't think bugs/issues should be part of determining if one >>>> algorithm vs the other is superior. Obviously, it is worth mentioning >>>> the bugs, but if the fast lookup method has worse precision and recall >>>> but better performance, vs the slower but more accurate first word >>>> lookup algorithm, then time should be invested in fixing those bugs >>>> and resolving those weird issues. >>>> >>>> Now I'm not saying which one is superior in this case, as the data >>>> will end up speaking for itself one way or the other; bus as of right >>>> now, I'm not convinced yet that the old dictionary lookup is obsolete >>>> yet, and I'm not sure the community is convinced yet either. >>>> >>>> >>>> >>>> [image: IMAT Solutions] <http://imatsolutions.com> >>>> >>>> *Kim Ebert* >>>> Software Engineer >>>> [image: Office:]801.669.7342 >>>> [email protected] <[email protected]> >>>> >>>> On 12/19/2014 08:39 AM, Chen, Pei wrote: >>>> >>>> Also check out stats that Sean ran before releasing the new component >>> on: >>>> >>>> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup- >>>> fast/doc/DictionaryLookupStats.docx >>>> >>>> From the evaluation and experience, the new lookup algorithm should be >>>> a huge improvement in terms of both speed and accuracy. >>>> >>>> This is very different than what Bruce mentioned… I’m sure Sean will >>>> chime here. >>>> >>>> (The old dictionary lookup is essentially obsolete now- plagued with >>>> bugs/issues as you mentioned.) >>>> >>>> --Pei >>>> >>>> >>>> >>>> *From:* Kim Ebert [mailto:[email protected] >>>> <[email protected]>] >>>> *Sent:* Friday, December 19, 2014 10:25 AM >>>> *To:* [email protected] >>>> *Subject:* Re: cTakes Annotation Comparison >>>> >>>> >>>> >>>> Guergana, >>>> >>>> I'm curious to the number of records that are in your gold standard >>>> sets, or if your gold standard set was run through a long running >>> cTAKES process. >>>> I know at some point we fixed a bug in the old dictionary lookup that >>>> caused the permutations to become corrupted over time. Typically this >>>> isn't seen in the first few records, but over time as patterns are >>>> used the permutations would become corrupted. This caused documents >>>> that were fed through cTAKES more than once to have less codes >>>> returned than the first time. >>>> >>>> For example, if a permutation of 4,2,3,1 was found, the permutation >>>> would be corrupted to be 1,2,3,4. It would no longer be possible to >>>> detect permutations of 4,2,3,1 until cTAKES was restarted. We got the >>>> fix in after the cTAKES 3.2.0 release. >>>> https://issues.apache.org/jira/browse/CTAKES-310 >>>> Depending upon the corpus size, I could see the permutation engine >>>> eventually only have a single permutation of 1,2,3,4. >>>> >>>> Typically though, this isn't very easily detected in the first 100 or >>>> so documents. >>>> >>>> We discovered this issue when we made cTAKES have consistent output of >>>> codes in our system. >>>> >>>> >>>> >>>> [image: IMAT Solutions] <http://imatsolutions.com> >>>> >>>> *Kim Ebert* >>>> Software Engineer >>>> [image: Office:]801.669.7342 >>>> [email protected] <[email protected]> >>>> On 12/19/2014 07:05 AM, Savova, Guergana wrote: >>>> >>>> We are doing a similar kind of evaluation and will report the results. >>>> >>>> >>>> >>>> Before we released the Fast lookup, we did a systematic evaluation >>> across three gold standard sets. We did not see the trend that Bruce >>> reported below. The P, R and F1 results from the old dictionary look up and >>> the fast one were similar. >>>> >>>> >>>> Thank you everyone! >>>> >>>> --Guergana >>>> >>>> >>>> >>>> -----Original Message----- >>>> >>>> From: David Kincaid [mailto:[email protected] >>>> <[email protected]>] >>>> >>>> Sent: Friday, December 19, 2014 9:02 AM >>>> >>>> To: [email protected] >>>> >>>> Subject: Re: cTakes Annotation Comparison >>>> >>>> >>>> >>>> Thanks for this, Bruce! Very interesting work. It confirms what I've >>> seen in my small tests that I've done in a non-systematic way. Did you >>> happen to capture the number of false positives yet (annotations made by >>> cTAKES that are not in the human adjudicated standard)? I've seen a lot of >>> dictionary hits that are not actually entity mentions, but I haven't had a >>> chance to do a systematic analysis (we're working on our annotated gold >>> standard now). One great example is the antibiotic "Today". Every time the >>> word today appears in any text it is annotated as a medication mention when >>> it almost never is being used in that sense. >>>> >>>> >>>> These results by themselves are quite disappointing to me. Both the >>> UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor >>> recall. It seems like the trade off for more speed is a ten-fold (or more) >>> decrease in entity recognition. >>>> >>>> >>>> Thanks again for sharing your results with us. I think they are very >>> useful to the project. >>>> >>>> >>>> - Dave >>>> >>>> >>>> >>>> On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen < >>> [email protected]> wrote: >>>> >>>> >>>> Actually, we are working on a similar tool to compare it to the human >>>> >>>> adjudicated standard for the set we tested against. I didn't mention >>>> >>>> it before because the tool isn't complete yet, but initial results for >>>> >>>> the set (excluding those marked as "CUI-less") was as follows: >>>> >>>> >>>> >>>> Human adjudicated annotations: 4591 (excluding CUI-less) >>>> >>>> >>>> >>>> Annotations found matching the human adjudicated standard >>>> >>>> UMLSProcessor 2245 >>>> >>>> FastUMLSProcessor 215 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> [image: IMAT Solutions] <http://imatsolutions.com> >>>> <http://imatsolutions.com> Bruce Tietjen >>>> >>>> Senior Software Engineer >>>> >>>> [image: Mobile:] 801.634.1547 >>>> >>>> [email protected] >>>> >>>> >>>> >>>> On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei >>>> >>>> <[email protected] >>>> >>>> >>>> >>>> wrote: >>>> >>>> >>>> >>>> Bruce, >>>> >>>> Thanks for this-- very useful. >>>> >>>> Perhaps Sean Finan comment more- >>>> >>>> but it's also probably worth it to compare to an adjudicated human >>>> >>>> annotated gold standard. >>>> >>>> >>>> >>>> --Pei >>>> >>>> >>>> >>>> -----Original Message----- >>>> >>>> From: Bruce Tietjen [mailto:[email protected] >>>> <[email protected]>] >>>> >>>> Sent: Thursday, December 18, 2014 1:45 PM >>>> >>>> To: [email protected] >>>> >>>> Subject: cTakes Annotation Comparison >>>> >>>> >>>> >>>> With the recent release of cTakes 3.2.1, we were very interested in >>>> >>>> checking for any differences in annotations between using the >>>> >>>> AggregatePlaintextUMLSProcessor pipeline and the >>>> >>>> AggregatePlanetextFastUMLSProcessor pipeline within this release of >>>> >>>> cTakes >>>> >>>> with its associated set of UMLS resources. >>>> >>>> >>>> >>>> We chose to use the SHARE 14-a-b Training data that consists of 199 >>>> >>>> documents (Discharge 61, ECG 54, Echo 42 and Radiology 42) as the >>>> >>>> basis for the comparison. >>>> >>>> >>>> >>>> We decided to share a summary of the results with the development >>>> >>>> community. >>>> >>>> >>>> >>>> Documents Processed: 199 >>>> >>>> >>>> >>>> Processing Time: >>>> >>>> UMLSProcessor 2,439 seconds >>>> >>>> FastUMLSProcessor 1,837 seconds >>>> >>>> >>>> >>>> Total Annotations Reported: >>>> >>>> UMLSProcessor 20,365 annotations >>>> >>>> FastUMLSProcessor 8,284 annotations >>>> >>>> >>>> >>>> >>>> >>>> Annotation Comparisons: >>>> >>>> Annotations common to both sets: 3,940 >>>> >>>> Annotations reported only by the UMLSProcessor: 16,425 >>>> >>>> Annotations reported only by the FastUMLSProcessor: 4,344 >>>> >>>> >>>> >>>> >>>> >>>> If anyone is interested, following was our test procedure: >>>> >>>> >>>> >>>> We used the UIMA CPE to process the document set twice, once using >>>> >>>> the AggregatePlaintextUMLSProcessor pipeline and once using the >>>> >>>> AggregatePlaintextFastUMLSProcessor pipeline. We used the >>>> >>>> WriteCAStoFile CAS consumer to write the results to output files. >>>> >>>> >>>> >>>> We used a tool we recently developed to analyze and compare the >>>> >>>> annotations generated by the two pipelines. The tool compares the >>>> >>>> two outputs for each file and reports any differences in the >>>> >>>> annotations (MedicationMention, SignSymptomMention, >>>> >>>> ProcedureMention, AnatomicalSiteMention, and >>>> >>>> DiseaseDisorderMention) between the two output sets. The tool >>>> >>>> reports the number of 'matches' and 'misses' between each annotation >>> set. A 'match' >>>> is >>>> >>>> defined as the presence of an identified source text interval with >>>> >>>> its associated CUI appearing in both annotation sets. A 'miss' is >>>> >>>> defined as the presence of an identified source text interval and >>>> >>>> its associated CUI in one annotation set, but no matching identified >>>> >>>> source text interval >>>> >>>> and >>>> >>>> CUI in the other. The tool also reports the total number of >>>> >>>> annotations (source text intervals with associated CUIs) reported in >>>> >>>> each annotation set. The compare tool is in our GitHub repository at >>>> >>>> https://github.com/perfectsearch/cTAKES-compare >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>
