ah! Excellent news... that's much more inline with our experience and evaluation results.
On Fri, Dec 19, 2014 at 5:04 PM, Bruce Tietjen < [email protected]> wrote: > My apologies to Sean and everyone, > > I am happy to report that I found a bug in our analysis tools that was > missing the last FSArray entry for any FSArray list. > > With the bug fixed, the results look MUCH better. > > UMLSProcessor found 31,598 annotations > FastUMLSProcessor found 30,716 annotations > > There were 23,522 annotations that were exact matches between the two. > > When comparing with the gold standard annotations (4591 annotations): > > UMLSProcessor found 2632 matches (2,735 including overlaps) > FastUMLSProcessor found 2795 matches (2,842 including overlaps) > > > > > > > [image: IMAT Solutions] <http://imatsolutions.com> > Bruce Tietjen > Senior Software Engineer > [image: Mobile:] 801.634.1547 > [email protected] > > On Fri, Dec 19, 2014 at 1:49 PM, Bruce Tietjen < > [email protected]> wrote: > > > > I'll do that -- there is always a possibility of bugs in the analysis > > tool. > > > > > > [image: IMAT Solutions] <http://imatsolutions.com> > > Bruce Tietjen > > Senior Software Engineer > > [image: Mobile:] 801.634.1547 > > [email protected] > > > > On Fri, Dec 19, 2014 at 1:39 PM, Finan, Sean < > > [email protected]> wrote: > >> > >> Sorry, I meant “Do some spot checks on the validity”. In other words, > >> when your script reports that a cui and/or span is missing, manually > look > >> at the data and see if it really is. Just open up one .xmi in the CVD > and > >> see what it looks like. > >> > >> > >> > >> Thanks, > >> > >> Sean > >> > >> > >> > >> *From:* Bruce Tietjen [mailto:[email protected]] > >> *Sent:* Friday, December 19, 2014 3:37 PM > >> *To:* [email protected] > >> *Subject:* Re: cTakes Annotation Comparison > >> > >> > >> > >> My original results were using a newly downloaded cTakes 3.2.1 with the > >> separately downloaded resources copied in. There were no changes to any > of > >> the configuration files. > >> > >> As far as this last run, I modified the UMLSLookupAnnotator.xml and > >> AggregatePlaintextFastUMLSProcessor.xml. I've attached the modified > ones I > >> used (but they may not get through the mailing list). > >> > >> > >> > >> > >> > >> > >> [image: Image removed by sender. IMAT Solutions] > >> <http://imatsolutions.com> > >> > >> *Bruce Tietjen* > >> Senior Software Engineer > >> [image: Image removed by sender. Mobile:]801.634.1547 > >> [email protected] > >> > >> > >> > >> On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean < > >> [email protected]> wrote: > >> > >> Hi Bruce, > >> > >> I'm not sure how there would be fewer matches with the overlap > >> processor. There should be all of the matches from the non-overlap > >> processor plus those from the overlap. Decreasing from 215 to 211 is > >> strange. Have you done any manual spot checks on this? It is really > >> bizarre that you'd only have two matches per document (100 docs?). > >> > >> Thanks, > >> Sean > >> > >> -----Original Message----- > >> From: Bruce Tietjen [mailto:[email protected]] > >> Sent: Friday, December 19, 2014 3:23 PM > >> To: [email protected] > >> Subject: Re: cTakes Annotation Comparison > >> > >> Sean, > >> > >> I tried the configuration changes you mentioned in your earlier email. > >> > >> The results are as follows: > >> > >> Total Annotations found: 12,161 (default configuration found 8,284) > >> > >> If counting exact span matches, this run only matched 211 (default > >> configuration matched 215). > >> > >> If counting overlapping spans, this run only matched 220 (default > >> configuration matched 224) > >> > >> Bruce > >> > >> > >> > >> [image: IMAT Solutions] <http://imatsolutions.com> Bruce Tietjen > >> Senior Software Engineer > >> [image: Mobile:] 801.634.1547 > >> [email protected] > >> > >> On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei < > >> [email protected]> > >> wrote: > >> > > >> > Kim, > >> > > >> > Maintenance is the factor not bugs/issue to forge ahead. > >> > > >> > They are 2 components that do the same thing with the same goal (As > >> > Sean mentioned, one should be able configure the new code base to > >> > replicate the old algorithm if required- it’s just a simpler and > >> > cleaner code base. If this is not the case or if there are issues, we > >> > should fix it and move forward.). > >> > > >> > We can keep the old component around for as long as needed, but it’s > >> > likely going to have limited support… > >> > > >> > --Pei > >> > > >> > > >> > > >> > *From:* Kim Ebert [mailto:[email protected]] > >> > *Sent:* Friday, December 19, 2014 1:47 PM > >> > *To:* Chen, Pei; [email protected] > >> > > >> > *Subject:* Re: cTakes Annotation Comparison > >> > > >> > > >> > > >> > Pei, > >> > > >> > I don't think bugs/issues should be part of determining if one > >> > algorithm vs the other is superior. Obviously, it is worth mentioning > >> > the bugs, but if the fast lookup method has worse precision and recall > >> > but better performance, vs the slower but more accurate first word > >> > lookup algorithm, then time should be invested in fixing those bugs > >> > and resolving those weird issues. > >> > > >> > Now I'm not saying which one is superior in this case, as the data > >> > will end up speaking for itself one way or the other; bus as of right > >> > now, I'm not convinced yet that the old dictionary lookup is obsolete > >> > yet, and I'm not sure the community is convinced yet either. > >> > > >> > > >> > > >> > [image: IMAT Solutions] <http://imatsolutions.com> > >> > > >> > *Kim Ebert* > >> > Software Engineer > >> > [image: Office:]801.669.7342 > >> > [email protected] <[email protected]> > >> > > >> > On 12/19/2014 08:39 AM, Chen, Pei wrote: > >> > > >> > Also check out stats that Sean ran before releasing the new component > >> on: > >> > > >> > > >> > > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup- > >> > fast/doc/DictionaryLookupStats.docx > >> > > >> > From the evaluation and experience, the new lookup algorithm should be > >> > a huge improvement in terms of both speed and accuracy. > >> > > >> > This is very different than what Bruce mentioned… I’m sure Sean will > >> > chime here. > >> > > >> > (The old dictionary lookup is essentially obsolete now- plagued with > >> > bugs/issues as you mentioned.) > >> > > >> > --Pei > >> > > >> > > >> > > >> > *From:* Kim Ebert [mailto:[email protected] > >> > <[email protected]>] > >> > *Sent:* Friday, December 19, 2014 10:25 AM > >> > *To:* [email protected] > >> > *Subject:* Re: cTakes Annotation Comparison > >> > > >> > > >> > > >> > Guergana, > >> > > >> > I'm curious to the number of records that are in your gold standard > >> > sets, or if your gold standard set was run through a long running > >> cTAKES process. > >> > I know at some point we fixed a bug in the old dictionary lookup that > >> > caused the permutations to become corrupted over time. Typically this > >> > isn't seen in the first few records, but over time as patterns are > >> > used the permutations would become corrupted. This caused documents > >> > that were fed through cTAKES more than once to have less codes > >> > returned than the first time. > >> > > >> > For example, if a permutation of 4,2,3,1 was found, the permutation > >> > would be corrupted to be 1,2,3,4. It would no longer be possible to > >> > detect permutations of 4,2,3,1 until cTAKES was restarted. We got the > >> > fix in after the cTAKES 3.2.0 release. > >> > https://issues.apache.org/jira/browse/CTAKES-310 > >> > Depending upon the corpus size, I could see the permutation engine > >> > eventually only have a single permutation of 1,2,3,4. > >> > > >> > Typically though, this isn't very easily detected in the first 100 or > >> > so documents. > >> > > >> > We discovered this issue when we made cTAKES have consistent output of > >> > codes in our system. > >> > > >> > > >> > > >> > [image: IMAT Solutions] <http://imatsolutions.com> > >> > > >> > *Kim Ebert* > >> > Software Engineer > >> > [image: Office:]801.669.7342 > >> > [email protected] <[email protected]> > >> > >> > > >> > On 12/19/2014 07:05 AM, Savova, Guergana wrote: > >> > > >> > We are doing a similar kind of evaluation and will report the results. > >> > > >> > > >> > > >> > Before we released the Fast lookup, we did a systematic evaluation > >> across three gold standard sets. We did not see the trend that Bruce > >> reported below. The P, R and F1 results from the old dictionary look up > and > >> the fast one were similar. > >> > > >> > > >> > > >> > Thank you everyone! > >> > > >> > --Guergana > >> > > >> > > >> > > >> > -----Original Message----- > >> > > >> > From: David Kincaid [mailto:[email protected] > >> > <[email protected]>] > >> > > >> > Sent: Friday, December 19, 2014 9:02 AM > >> > > >> > To: [email protected] > >> > > >> > Subject: Re: cTakes Annotation Comparison > >> > > >> > > >> > > >> > Thanks for this, Bruce! Very interesting work. It confirms what I've > >> seen in my small tests that I've done in a non-systematic way. Did you > >> happen to capture the number of false positives yet (annotations made by > >> cTAKES that are not in the human adjudicated standard)? I've seen a lot > of > >> dictionary hits that are not actually entity mentions, but I haven't > had a > >> chance to do a systematic analysis (we're working on our annotated gold > >> standard now). One great example is the antibiotic "Today". Every time > the > >> word today appears in any text it is annotated as a medication mention > when > >> it almost never is being used in that sense. > >> > > >> > > >> > > >> > These results by themselves are quite disappointing to me. Both the > >> UMLSProcessor and especially the FastUMLSProcessor seem to have pretty > poor > >> recall. It seems like the trade off for more speed is a ten-fold (or > more) > >> decrease in entity recognition. > >> > > >> > > >> > > >> > Thanks again for sharing your results with us. I think they are very > >> useful to the project. > >> > > >> > > >> > > >> > - Dave > >> > > >> > > >> > > >> > On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen < > >> [email protected]> wrote: > >> > > >> > > >> > > >> > Actually, we are working on a similar tool to compare it to the human > >> > > >> > adjudicated standard for the set we tested against. I didn't mention > >> > > >> > it before because the tool isn't complete yet, but initial results for > >> > > >> > the set (excluding those marked as "CUI-less") was as follows: > >> > > >> > > >> > > >> > Human adjudicated annotations: 4591 (excluding CUI-less) > >> > > >> > > >> > > >> > Annotations found matching the human adjudicated standard > >> > > >> > UMLSProcessor 2245 > >> > > >> > FastUMLSProcessor 215 > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > [image: IMAT Solutions] <http://imatsolutions.com> > >> > <http://imatsolutions.com> Bruce Tietjen > >> > > >> > Senior Software Engineer > >> > > >> > [image: Mobile:] 801.634.1547 > >> > > >> > [email protected] > >> > > >> > > >> > > >> > On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei > >> > > >> > <[email protected] > >> > > >> > > >> > > >> > wrote: > >> > > >> > > >> > > >> > Bruce, > >> > > >> > Thanks for this-- very useful. > >> > > >> > Perhaps Sean Finan comment more- > >> > > >> > but it's also probably worth it to compare to an adjudicated human > >> > > >> > annotated gold standard. > >> > > >> > > >> > > >> > --Pei > >> > > >> > > >> > > >> > -----Original Message----- > >> > > >> > From: Bruce Tietjen [mailto:[email protected] > >> > <[email protected]>] > >> > > >> > Sent: Thursday, December 18, 2014 1:45 PM > >> > > >> > To: [email protected] > >> > > >> > Subject: cTakes Annotation Comparison > >> > > >> > > >> > > >> > With the recent release of cTakes 3.2.1, we were very interested in > >> > > >> > checking for any differences in annotations between using the > >> > > >> > AggregatePlaintextUMLSProcessor pipeline and the > >> > > >> > AggregatePlanetextFastUMLSProcessor pipeline within this release of > >> > > >> > cTakes > >> > > >> > with its associated set of UMLS resources. > >> > > >> > > >> > > >> > We chose to use the SHARE 14-a-b Training data that consists of 199 > >> > > >> > documents (Discharge 61, ECG 54, Echo 42 and Radiology 42) as the > >> > > >> > basis for the comparison. > >> > > >> > > >> > > >> > We decided to share a summary of the results with the development > >> > > >> > community. > >> > > >> > > >> > > >> > Documents Processed: 199 > >> > > >> > > >> > > >> > Processing Time: > >> > > >> > UMLSProcessor 2,439 seconds > >> > > >> > FastUMLSProcessor 1,837 seconds > >> > > >> > > >> > > >> > Total Annotations Reported: > >> > > >> > UMLSProcessor 20,365 annotations > >> > > >> > FastUMLSProcessor 8,284 annotations > >> > > >> > > >> > > >> > > >> > > >> > Annotation Comparisons: > >> > > >> > Annotations common to both sets: > 3,940 > >> > > >> > Annotations reported only by the UMLSProcessor: 16,425 > >> > > >> > Annotations reported only by the FastUMLSProcessor: 4,344 > >> > > >> > > >> > > >> > > >> > > >> > If anyone is interested, following was our test procedure: > >> > > >> > > >> > > >> > We used the UIMA CPE to process the document set twice, once using > >> > > >> > the AggregatePlaintextUMLSProcessor pipeline and once using the > >> > > >> > AggregatePlaintextFastUMLSProcessor pipeline. We used the > >> > > >> > WriteCAStoFile CAS consumer to write the results to output files. > >> > > >> > > >> > > >> > We used a tool we recently developed to analyze and compare the > >> > > >> > annotations generated by the two pipelines. The tool compares the > >> > > >> > two outputs for each file and reports any differences in the > >> > > >> > annotations (MedicationMention, SignSymptomMention, > >> > > >> > ProcedureMention, AnatomicalSiteMention, and > >> > > >> > DiseaseDisorderMention) between the two output sets. The tool > >> > > >> > reports the number of 'matches' and 'misses' between each annotation > >> set. A 'match' > >> > > >> > is > >> > > >> > defined as the presence of an identified source text interval with > >> > > >> > its associated CUI appearing in both annotation sets. A 'miss' is > >> > > >> > defined as the presence of an identified source text interval and > >> > > >> > its associated CUI in one annotation set, but no matching identified > >> > > >> > source text interval > >> > > >> > and > >> > > >> > CUI in the other. The tool also reports the total number of > >> > > >> > annotations (source text intervals with associated CUIs) reported in > >> > > >> > each annotation set. The compare tool is in our GitHub repository at > >> > > >> > https://github.com/perfectsearch/cTAKES-compare > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >
