Hi Bruce, With Pei's help I just updated the sourceforge repo with the cTakes dictionaries. Checkout artifact ctakes-resources-snomed-rword-hsqldb-2011ab
Sean -----Original Message----- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Wednesday, October 08, 2014 11:52 AM To: dev@ctakes.apache.org Subject: Re: Differences in MedicationMention annotations on subsequent processing runs If I understand correctly, I would need new dictionary resources to run the rare word lookup method. Where can I find the necessary dictionary(ies) or how do I build them? [image: IMAT Solutions] <http://imatsolutions.com> Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Bruce, > > I would venture to say that this is neither expected nor desired. > > > > Before you fix it (or in addition to a fix), try to run with the new > dictionary lookup. It will have a different behavior, and it will be the > default dictionary lookup in future releases of cTakes – making fixes to > the current module slightly less urgent. > > > > Sean > > > > *From:* Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] > *Sent:* Wednesday, October 08, 2014 11:38 AM > *To:* dev@ctakes.apache.org > *Subject:* Differences in MedicationMention annotations on subsequent > processing runs > > > > > > I have encountered a situation in which the cTakes clinical pipeline > output differs between multiple runs on the same text with the same > configuration. > > The following snippets from a single document are sufficient to > demonstrate the issue: > > a gentle curve going into. irrigated with Bacitracin. > > > > The source of the difference is that the DictionaryLookupAnnotator uses a > map to filter out duplicate annotations for a single document location: > > // used to prevent duplicate hits > // key = hit begin,end key (java.lang.String) > // val = Set of MetaDataHit objects > private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>(); > > This map is shared between both the umls_ms_2011ab lookup and the > umls_ms_2011an_rxnorm lookup, > > > > If both dictionaries contain the same term, the order of dictionary lookup > execution determines the output.If the rxnorm lookup runs first, then a > MedicationMention annotation for Bacitracin appears in the final output. If > the standard umls lookup runs first, then there is no MedicationMention > annotation for Bacitracin. > > I will attach the output from the subsequent runs. (Hopefully the > attachment will make it through the system) > > > > Is this expected behavior? If not, what would be the expected behavior? > > > > [image: Image removed by sender. IMAT Solutions] > <http://imatsolutions.com> > > *Bruce Tietjen* > Senior Software Engineer > [image: Image removed by sender. Mobile:]801.634.1547 > bruce.tiet...@imatsolutions.com >