Rather than spam the mailing list with the list of filenames for the files in the set we used, I would be happy to send it to anyone interested privately.
[image: IMAT Solutions] <http://imatsolutions.com> Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 [email protected] On Fri, Dec 19, 2014 at 11:47 AM, Kim Ebert <[email protected]> wrote: > > Pei, > > I don't think bugs/issues should be part of determining if one algorithm > vs the other is superior. Obviously, it is worth mentioning the bugs, but > if the fast lookup method has worse precision and recall but better > performance, vs the slower but more accurate first word lookup algorithm, > then time should be invested in fixing those bugs and resolving those weird > issues. > > Now I'm not saying which one is superior in this case, as the data will > end up speaking for itself one way or the other; bus as of right now, I'm > not convinced yet that the old dictionary lookup is obsolete yet, and I'm > not sure the community is convinced yet either. > > > [image: IMAT Solutions] <http://imatsolutions.com> > Kim Ebert > Software Engineer > [image: Office:] 801.669.7342 > [email protected] <[email protected]> > On 12/19/2014 08:39 AM, Chen, Pei wrote: > > Also check out stats that Sean ran before releasing the new component on: > > > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-fast/doc/DictionaryLookupStats.docx > > From the evaluation and experience, the new lookup algorithm should be a > huge improvement in terms of both speed and accuracy. > > This is very different than what Bruce mentioned… I’m sure Sean will > chime here. > > (The old dictionary lookup is essentially obsolete now- plagued with > bugs/issues as you mentioned.) > > --Pei > > > > *From:* Kim Ebert [mailto:[email protected] > <[email protected]>] > *Sent:* Friday, December 19, 2014 10:25 AM > *To:* [email protected] > *Subject:* Re: cTakes Annotation Comparison > > > > Guergana, > > I'm curious to the number of records that are in your gold standard sets, > or if your gold standard set was run through a long running cTAKES process. > I know at some point we fixed a bug in the old dictionary lookup that > caused the permutations to become corrupted over time. Typically this isn't > seen in the first few records, but over time as patterns are used the > permutations would become corrupted. This caused documents that were fed > through cTAKES more than once to have less codes returned than the first > time. > > For example, if a permutation of 4,2,3,1 was found, the permutation would > be corrupted to be 1,2,3,4. It would no longer be possible to detect > permutations of 4,2,3,1 until cTAKES was restarted. We got the fix in after > the cTAKES 3.2.0 release. https://issues.apache.org/jira/browse/CTAKES-310 > Depending upon the corpus size, I could see the permutation engine > eventually only have a single permutation of 1,2,3,4. > > Typically though, this isn't very easily detected in the first 100 or so > documents. > > We discovered this issue when we made cTAKES have consistent output of > codes in our system. > > > > [image: IMAT Solutions] <http://imatsolutions.com> > > *Kim Ebert* > Software Engineer > [image: Office:]801.669.7342 > [email protected] <[email protected]> > > On 12/19/2014 07:05 AM, Savova, Guergana wrote: > > We are doing a similar kind of evaluation and will report the results. > > > > Before we released the Fast lookup, we did a systematic evaluation across > three gold standard sets. We did not see the trend that Bruce reported below. > The P, R and F1 results from the old dictionary look up and the fast one were > similar. > > > > Thank you everyone! > > --Guergana > > > > -----Original Message----- > > From: David Kincaid [mailto:[email protected] <[email protected]>] > > Sent: Friday, December 19, 2014 9:02 AM > > To: [email protected] > > Subject: Re: cTakes Annotation Comparison > > > > Thanks for this, Bruce! Very interesting work. It confirms what I've seen in > my small tests that I've done in a non-systematic way. Did you happen to > capture the number of false positives yet (annotations made by cTAKES that > are not in the human adjudicated standard)? I've seen a lot of dictionary > hits that are not actually entity mentions, but I haven't had a chance to do > a systematic analysis (we're working on our annotated gold standard now). One > great example is the antibiotic "Today". Every time the word today appears in > any text it is annotated as a medication mention when it almost never is > being used in that sense. > > > > These results by themselves are quite disappointing to me. Both the > UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor > recall. It seems like the trade off for more speed is a ten-fold (or more) > decrease in entity recognition. > > > > Thanks again for sharing your results with us. I think they are very useful > to the project. > > > > - Dave > > > > On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen < > [email protected]> wrote: > > > > Actually, we are working on a similar tool to compare it to the human > > adjudicated standard for the set we tested against. I didn't mention > > it before because the tool isn't complete yet, but initial results for > > the set (excluding those marked as "CUI-less") was as follows: > > > > Human adjudicated annotations: 4591 (excluding CUI-less) > > > > Annotations found matching the human adjudicated standard > > UMLSProcessor 2245 > > FastUMLSProcessor 215 > > > > > > > > > > > > > > [image: IMAT Solutions] <http://imatsolutions.com> > <http://imatsolutions.com> Bruce Tietjen > > Senior Software Engineer > > [image: Mobile:] 801.634.1547 > > [email protected] > > > > On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei > > <[email protected] > > > > wrote: > > > > Bruce, > > Thanks for this-- very useful. > > Perhaps Sean Finan comment more- > > but it's also probably worth it to compare to an adjudicated human > > annotated gold standard. > > > > --Pei > > > > -----Original Message----- > > From: Bruce Tietjen [mailto:[email protected] > <[email protected]>] > > Sent: Thursday, December 18, 2014 1:45 PM > > To: [email protected] > > Subject: cTakes Annotation Comparison > > > > With the recent release of cTakes 3.2.1, we were very interested in > > checking for any differences in annotations between using the > > AggregatePlaintextUMLSProcessor pipeline and the > > AggregatePlanetextFastUMLSProcessor pipeline within this release of > > cTakes > > with its associated set of UMLS resources. > > > > We chose to use the SHARE 14-a-b Training data that consists of 199 > > documents (Discharge 61, ECG 54, Echo 42 and Radiology 42) as the > > basis for the comparison. > > > > We decided to share a summary of the results with the development > > community. > > > > Documents Processed: 199 > > > > Processing Time: > > UMLSProcessor 2,439 seconds > > FastUMLSProcessor 1,837 seconds > > > > Total Annotations Reported: > > UMLSProcessor 20,365 annotations > > FastUMLSProcessor 8,284 annotations > > > > > > Annotation Comparisons: > > Annotations common to both sets: 3,940 > > Annotations reported only by the UMLSProcessor: 16,425 > > Annotations reported only by the FastUMLSProcessor: 4,344 > > > > > > If anyone is interested, following was our test procedure: > > > > We used the UIMA CPE to process the document set twice, once using > > the AggregatePlaintextUMLSProcessor pipeline and once using the > > AggregatePlaintextFastUMLSProcessor pipeline. We used the > > WriteCAStoFile CAS consumer to write the results to output files. > > > > We used a tool we recently developed to analyze and compare the > > annotations generated by the two pipelines. The tool compares the > > two outputs for each file and reports any differences in the > > annotations (MedicationMention, SignSymptomMention, > > ProcedureMention, AnatomicalSiteMention, and > > DiseaseDisorderMention) between the two output sets. The tool > > reports the number of 'matches' and 'misses' between each annotation set. A > 'match' > > is > > defined as the presence of an identified source text interval with > > its associated CUI appearing in both annotation sets. A 'miss' is > > defined as the presence of an identified source text interval and > > its associated CUI in one annotation set, but no matching identified > > source text interval > > and > > CUI in the other. The tool also reports the total number of > > annotations (source text intervals with associated CUIs) reported in > > each annotation set. The compare tool is in our GitHub repository at > > https://github.com/perfectsearch/cTAKES-compare > > > > > > > > >
