Sean, I don't think that would be an issue since both the rare word lookup and the first word lookup are using UMLS 2011AB. Or is the rare word lookup using a different dictionary?
I would expect roughly similar results between the two when it comes to differences between UMLS versions. IMAT Solutions <http://imatsolutions.com> Kim Ebert Software Engineer Office: 801.669.7342 kim.eb...@imatsolutions.com <mailto:greg.hub...@imatsolutions.com> On 12/19/2014 11:31 AM, Finan, Sean wrote: > One quick mention: > > The cTakes dictionaries are built with UMLS 2011AB. If the Human annotations > were not done using the same UMLS version then there WILL be differences in > CUI and Semantic group. I don't have time to go into it with details, > examples, etc. just be aware that every 6 months cuis are added, removed, > deprecated, and moved from one TUI to another. > > Sean > > -----Original Message----- > From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu] > Sent: Friday, December 19, 2014 1:28 PM > To: dev@ctakes.apache.org > Subject: RE: cTakes Annotation Comparison > > Several thoughts: > 1. The ShARE corpus annotates only mentions of type Diseases/Disorders and > only Anatomical Sites associated with a Disease/Disorder. This is by design. > cTAKES annotates all mentions of types Diseases/Disorders, Signs/Symptoms, > Procedures, Medications and Anatomical Sites. Therefore you will get MANY > more annotations with cTAKES. Eventually the ShARe corpus will be expanded to > the other types. > > 2. Keeping (1) in mind, you can approximately estimate the > precision/recall/f1 of cTAKES on the ShARe corpus if you output only mentions > of type Disease/Disorder. > > 3. Could you send us the list of files you use from ShARe to test? We have > the corpus and would like to run against as well. > > Hope this makes sense... > --Guergana > > -----Original Message----- > From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] > Sent: Friday, December 19, 2014 1:16 PM > To: dev@ctakes.apache.org > Subject: Re: cTakes Annotation Comparison > > Our analysis against the human adjudicated gold standard from this SHARE > corpus is using a simple check to see if the cTakes output included the > annotation specified by the gold standard. The initial results I reported > were for exact matches of CUI and text span. Only exact matches were counted. > > It looks like if we also count as matches cTakes annotations with a matching > CUI and a text span that overlaps the gold standard text span then the > matches increase to 224 matching annotations for the FastUMLS pipeline and > 2319 for the the old pipeline. > > The question was also asked about annotations in the cTakes output that were > not in the human adjudicated gold standard. The answer is yes, there were a > lot of additional annotations made by cTakes that don't appear to be in the > gold standard. We haven't analyzed that yet, but it looks like the gold > standard we are using may only have Disease_Disorder annotations. > > > > [image: IMAT Solutions] <http://imatsolutions.com> Bruce Tietjen Senior > Software Engineer > [image: Mobile:] 801.634.1547 > bruce.tiet...@imatsolutions.com > > On Fri, Dec 19, 2014 at 9:54 AM, Miller, Timothy < > timothy.mil...@childrens.harvard.edu> wrote: >> Thanks Kim, >> This sounds interesting though I don't totally understand it. Are you >> saying that extraction performance for a given note depends on which >> order the note was in the processing queue? If so that's pretty bad! >> If you (or anyone else who understands this issue) has a concrete >> example I think that might help me understand what the problem is/was. >> >> Even though, as Pei mentioned, we are going to try moving the >> community to the faster dictionary, I would like to understand better >> just to help myself avoid issues of this type going forward (and >> verify the new dictionary doesn't use similar logic). >> >> Also, when we finish annotating the sample notes, might we use that as >> a point of comparison for the two dictionaries? That would get around >> the issue that not everyone has access to the datasets we used for >> validation and others are likely not able to share theirs either. And >> maybe we can replicate the notes if we want to simulate the scenario >> Kim is talking about with thousands or more notes. >> >> Tim >> >> >> On 12/19/2014 10:24 AM, Kim Ebert wrote: >> Guergana, >> >> I'm curious to the number of records that are in your gold standard >> sets, or if your gold standard set was run through a long running cTAKES >> process. >> I know at some point we fixed a bug in the old dictionary lookup that >> caused the permutations to become corrupted over time. Typically this >> isn't seen in the first few records, but over time as patterns are >> used the permutations would become corrupted. This caused documents >> that were fed through cTAKES more than once to have less codes >> returned than the first time. >> >> For example, if a permutation of 4,2,3,1 was found, the permutation >> would be corrupted to be 1,2,3,4. It would no longer be possible to >> detect permutations of 4,2,3,1 until cTAKES was restarted. We got the >> fix in after the cTAKES 3.2.0 release. >> https://issues.apache.org/jira/browse/CTAKES-310 >> Depending upon the corpus size, I could see the permutation engine >> eventually only have a single permutation of 1,2,3,4. >> >> Typically though, this isn't very easily detected in the first 100 or >> so documents. >> >> We discovered this issue when we made cTAKES have consistent output of >> codes in our system. >> >> [IMAT Solutions]<http://imatsolutions.com> >> Kim Ebert >> Software Engineer >> [Office:] 801.669.7342 >> kim.eb...@imatsolutions.com<mailto:greg.hub...@imatsolutions.com> >> On 12/19/2014 07:05 AM, Savova, Guergana wrote: >> >> We are doing a similar kind of evaluation and will report the results. >> >> Before we released the Fast lookup, we did a systematic evaluation >> across three gold standard sets. We did not see the trend that Bruce >> reported below. The P, R and F1 results from the old dictionary look >> up and the fast one were similar. >> >> Thank you everyone! >> --Guergana >> >> -----Original Message----- >> From: David Kincaid [mailto:kincaid.d...@gmail.com] >> Sent: Friday, December 19, 2014 9:02 AM >> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> >> Subject: Re: cTakes Annotation Comparison >> >> Thanks for this, Bruce! Very interesting work. It confirms what I've >> seen in my small tests that I've done in a non-systematic way. Did you >> happen to capture the number of false positives yet (annotations made >> by cTAKES that are not in the human adjudicated standard)? I've seen a >> lot of dictionary hits that are not actually entity mentions, but I >> haven't had a chance to do a systematic analysis (we're working on our >> annotated gold standard now). One great example is the antibiotic >> "Today". Every time the word today appears in any text it is annotated >> as a medication mention when it almost never is being used in that sense. >> >> These results by themselves are quite disappointing to me. Both the >> UMLSProcessor and especially the FastUMLSProcessor seem to have pretty >> poor recall. It seems like the trade off for more speed is a ten-fold >> (or more) decrease in entity recognition. >> >> Thanks again for sharing your results with us. I think they are very >> useful to the project. >> >> - Dave >> >> On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen < >> bruce.tiet...@perfectsearchcorp.com<mailto: >> bruce.tiet...@perfectsearchcorp.com>> wrote: >> >> >> Actually, we are working on a similar tool to compare it to the human >> adjudicated standard for the set we tested against. I didn't mention >> it before because the tool isn't complete yet, but initial results for >> the set (excluding those marked as "CUI-less") was as follows: >> >> Human adjudicated annotations: 4591 (excluding CUI-less) >> >> Annotations found matching the human adjudicated standard >> UMLSProcessor 2245 >> FastUMLSProcessor 215 >> >> >> >> >> >> >> [image: IMAT Solutions] <http://imatsolutions.com>< >> http://imatsolutions.com> Bruce Tietjen Senior Software Engineer >> [image: Mobile:] 801.634.1547 >> bruce.tiet...@imatsolutions.com<mailto:bruce.tiet...@imatsolutions.com >> On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei >> <pei.c...@childrens.harvard.edu<mailto:pei.c...@childrens.harvard.edu> >> >> >> wrote: >> >> >> Bruce, >> Thanks for this-- very useful. >> Perhaps Sean Finan comment more- >> but it's also probably worth it to compare to an adjudicated human >> annotated gold standard. >> >> --Pei >> >> -----Original Message----- >> From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] >> Sent: Thursday, December 18, 2014 1:45 PM >> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> >> Subject: cTakes Annotation Comparison >> >> With the recent release of cTakes 3.2.1, we were very interested in >> checking for any differences in annotations between using the >> AggregatePlaintextUMLSProcessor pipeline and the >> AggregatePlanetextFastUMLSProcessor pipeline within this release of >> >> >> cTakes >> >> >> with its associated set of UMLS resources. >> >> We chose to use the SHARE 14-a-b Training data that consists of 199 >> documents (Discharge 61, ECG 54, Echo 42 and Radiology 42) as the >> basis for the comparison. >> >> We decided to share a summary of the results with the development >> community. >> >> Documents Processed: 199 >> >> Processing Time: >> UMLSProcessor 2,439 seconds >> FastUMLSProcessor 1,837 seconds >> >> Total Annotations Reported: >> UMLSProcessor 20,365 annotations >> FastUMLSProcessor 8,284 annotations >> >> >> Annotation Comparisons: >> Annotations common to both sets: 3,940 >> Annotations reported only by the UMLSProcessor: 16,425 >> Annotations reported only by the FastUMLSProcessor: 4,344 >> >> >> If anyone is interested, following was our test procedure: >> >> We used the UIMA CPE to process the document set twice, once using the >> AggregatePlaintextUMLSProcessor pipeline and once using the >> AggregatePlaintextFastUMLSProcessor pipeline. We used the >> WriteCAStoFile CAS consumer to write the results to output files. >> >> We used a tool we recently developed to analyze and compare the >> annotations generated by the two pipelines. The tool compares the two >> outputs for each file and reports any differences in the annotations >> (MedicationMention, SignSymptomMention, ProcedureMention, >> AnatomicalSiteMention, and >> DiseaseDisorderMention) between the two output sets. The tool reports >> the number of 'matches' and 'misses' between each annotation set. >> A 'match' >> >> >> is >> >> >> defined as the presence of an identified source text interval with its >> associated CUI appearing in both annotation sets. A 'miss' is defined >> as the presence of an identified source text interval and its >> associated CUI in one annotation set, but no matching identified >> source text interval >> >> >> and >> >> >> CUI in the other. The tool also reports the total number of >> annotations (source text intervals with associated CUIs) reported in >> each annotation set. The compare tool is in our GitHub repository at >> https://github.com/perfectsearch/cTAKES-compare >> >> >> >> >>