Sorry, my mistake, it was still running the old dictionary lookups. Since your earlier question, I have been trying to get the lookup-fast to work and have not yet been successful.
I made the change to AgregatePlaintextUMLSProcessor.xml: <!-- <delegateAnalysisEngine key="DictionaryLookupAnnotatorDB"> <import location="../../../ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml"/> </delegateAnalysisEngine> --> <delegateAnalysisEngine key="DictionaryLookupAnnotatorDB"> <import location="../../../ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml"/> </delegateAnalysisEngine> But I've been getting the following exception and trying to figure out why: Caused by: org.apache.uima.resource.ResourceInitializationException: Could not access the resource data at file:org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml. at org.apache.uima.resource.impl.DataResource_impl.initialize(DataResource_impl.java:127) at org.apache.uima.util.SimpleResourceFactory.produceResource(SimpleResourceFactory.java:123) ... 31 more [image: IMAT Solutions] <http://imatsolutions.com> Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Thu, Oct 9, 2014 at 11:42 AM, Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > I just ran the –fast with an example containing bacitracin in four > sentences, once being the first word and once being the last. In ten of > ten runs all four bacitracin mentions were discovered. > > You completely replaced the dictionary lookup with ? > <delegateAnalysisEngine key="DictionaryLookupAnnotatorDB"> > <import > location="../../../ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml"/> > </delegateAnalysisEngine> > > > From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] > Sent: Thursday, October 09, 2014 11:42 AM > To: dev@ctakes.apache.org > Subject: Re: Differences in MedicationMention annotations on subsequent > processing runs > > I tried the Dictionary-lookup-fast module and the bahavior is the same. I > did have to run it a number of times before timing was right to reproduce > the issue. With the older lookup, chances were about 50/50 between which > dictionary ran first. Using the dictionary-fast, it seems more like 70/30 > with the standard umls lookup being more likely to run first than not. > Which means that most of the time, there is no MedicationMention annotation > for Bacitracin. (See Attached) > The code with the issue is the DictionaryLookupAnnotator which is a > container for the dictionaries and it iterates through the list of lookup > dictionaries so that part of the code path does not seem to have changed. > In the past, the rxNorm dictionary was a Lucene search and so I'm guessing > it behaved a little differently than it does now with both being JDBC. > The fact that the filter is at this location seems to indicate that it may > have been by intended for it to be across all dictionaries. On the other > hand, it appears to mask out the lookups for the different dictionaries, > resulting in some annotations not being made. > > So, the real question is how should the filter work -- should the > annotation filtering be per lookup dictionary, or be across all > dictionaries? Or is there something wrong elsewhere that causes > I lean towards having the filter function per dictionary. This may risk > having duplicate annotations, but that would probably be better than > missing the annotation all together. > > > > > > [IMAT Solutions]<http://imatsolutions.com> > Bruce Tietjen > Senior Software Engineer > [Mobile:]801.634.1547 > bruce.tiet...@imatsolutions.com<mailto:bruce.tiet...@imatsolutions.com> > > On Wed, Oct 8, 2014 at 10:02 AM, Finan, Sean < > sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu>> > wrote: > Hi Bruce, > > With Pei's help I just updated the sourceforge repo with the cTakes > dictionaries. Checkout artifact ctakes-resources-snomed-rword-hsqldb-2011ab > > Sean > > -----Original Message----- > From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com<mailto: > bruce.tiet...@perfectsearchcorp.com>] > Sent: Wednesday, October 08, 2014 11:52 AM > To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> > Subject: Re: Differences in MedicationMention annotations on subsequent > processing runs > > If I understand correctly, I would need new dictionary resources to run the > rare word lookup method. > > Where can I find the necessary dictionary(ies) or how do I build them? > > > [image: IMAT Solutions] <http://imatsolutions.com> > Bruce Tietjen > Senior Software Engineer > [image: Mobile:] 801.634.1547<tel:801.634.1547> > bruce.tiet...@imatsolutions.com<mailto:bruce.tiet...@imatsolutions.com> > > On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean < > sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu>> > wrote: > > > Hi Bruce, > > > > I would venture to say that this is neither expected nor desired. > > > > > > > > Before you fix it (or in addition to a fix), try to run with the new > > dictionary lookup. It will have a different behavior, and it will be > the > > default dictionary lookup in future releases of cTakes – making fixes to > > the current module slightly less urgent. > > > > > > > > Sean > > > > > > > > *From:* Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com > <mailto:bruce.tiet...@perfectsearchcorp.com>] > > *Sent:* Wednesday, October 08, 2014 11:38 AM > > *To:* dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> > > *Subject:* Differences in MedicationMention annotations on subsequent > > processing runs > > > > > > > > > > > > I have encountered a situation in which the cTakes clinical pipeline > > output differs between multiple runs on the same text with the same > > configuration. > > > > The following snippets from a single document are sufficient to > > demonstrate the issue: > > > > a gentle curve going into. irrigated with Bacitracin. > > > > > > > > The source of the difference is that the DictionaryLookupAnnotator uses a > > map to filter out duplicate annotations for a single document location: > > > > // used to prevent duplicate hits > > // key = hit begin,end key (java.lang.String) > > // val = Set of MetaDataHit objects > > private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>(); > > > > This map is shared between both the umls_ms_2011ab lookup and the > > umls_ms_2011an_rxnorm lookup, > > > > > > > > If both dictionaries contain the same term, the order of dictionary > lookup > > execution determines the output.If the rxnorm lookup runs first, then a > > MedicationMention annotation for Bacitracin appears in the final output. > If > > the standard umls lookup runs first, then there is no MedicationMention > > annotation for Bacitracin. > > > > I will attach the output from the subsequent runs. (Hopefully the > > attachment will make it through the system) > > > > > > > > Is this expected behavior? If not, what would be the expected behavior? > > > > > > > > [image: Image removed by sender. IMAT Solutions] > > <http://imatsolutions.com> > > > > *Bruce Tietjen* > > Senior Software Engineer > > [image: Image removed by sender. Mobile:]801.634.1547<tel:801.634.1547> > > bruce.tiet...@imatsolutions.com<mailto:bruce.tiet...@imatsolutions.com> > > > >