(Trying to avoid passing individual jars via email) Sent from my iPhone
> On Sep 9, 2014, at 5:26 PM, "Chen, Pei" <pei.c...@childrens.harvard.edu> > wrote: > > Sean- > Aren't the scripts to generate the DB already available in the sandbox area? > > Sent from my iPhone > >> On Sep 9, 2014, at 5:24 PM, "Finan, Sean" <sean.fi...@childrens.harvard.edu> >> wrote: >> >> There is a tool to generate a dictionary in the new format using the UMLS >> MR*** files. >> >> The module can also read directly from a file with bar-separated values: >> CUI|Text or CUI|TUI|Text which could be useful for small custom dictionaries. >> >> I can send a copy of the dictionary creator jar and instructions tomorrow. >> >> Sean >> ________________________________________ >> From: Bruce Tietjen [bruce.tiet...@perfectsearchcorp.com] >> Sent: Tuesday, September 09, 2014 5:17 PM >> To: dev@ctakes.apache.org >> Subject: Re: Ctakes to process 5000K recoreds >> >> Sean, >> >> If that is a script for generating a dictionary for use with >> dictionary-lookup-fast, I would also be very interested in checking it out. >> >> Thanks, >> >> Bruce >> >> >> [image: IMAT Solutions] <http://imatsolutions.com> >> Bruce Tietjen >> Senior Software Engineer >> [image: Mobile:] 801.634.1547 >> bruce.tiet...@imatsolutions.com >> >> On Tue, Sep 9, 2014 at 2:40 PM, Nick Nikandish < >> snika...@emerginghealthit.com> wrote: >> >>> Great. I will do that. Thanks again. >>> >>> Nick >>> >>> -----Original Message----- >>> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] >>> Sent: Tuesday, September 09, 2014 4:39 PM >>> To: dev@ctakes.apache.org >>> Subject: RE: Ctakes to process 5000K recoreds >>> >>> Just use it with cTakes. Instead of removing other modules from the >>> pipeline, replace the dictionary-lookup with dictionary-lookup-fast. >>> >>> For the >>> desc/ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml >>> , you would modify: >>> >>> <delegateAnalysisEngine key="DictionaryLookupAnnotatorDB"> >>> <import >>> location="../../../ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml"/> >>> </delegateAnalysisEngine> >>> >>> To be: >>> >>> <delegateAnalysisEngine key="DictionaryLookupAnnotatorDB"> >>> <import >>> location="../../../ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml"/> >>> </delegateAnalysisEngine> >>> >>> >>> That should be it. You can then leave the rest of the module >>> specifications alone. >>> >>> Sean >>> >>> ________________________________________ >>> From: Nick Nikandish [snika...@emerginghealthit.com] >>> Sent: Tuesday, September 09, 2014 4:32 PM >>> To: dev@ctakes.apache.org >>> Subject: RE: Ctakes to process 5000K recoreds >>> >>> Hi Sean, >>> >>> Many thanks, I will try it tomorrow. Do you have any special instruction >>> to run that scrip or I have to use it with cTakes? >>> >>> Thanks, >>> Nick >>> >>> -----Original Message----- >>> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] >>> Sent: Tuesday, September 09, 2014 4:24 PM >>> To: dev@ctakes.apache.org >>> Subject: RE: Ctakes to process 5000K recoreds >>> >>> Hi Nick, >>> >>> I think that the bottleneck is probably the lookup module itself. So, I >>> just sent you a secure email/ftp link. It contains a build of the new >>> dictionary-lookup-fast module. Should you choose to try it, let me know >>> how things turn out. >>> >>> Sean >>> ________________________________________ >>> From: Nick Nikandish [snika...@emerginghealthit.com] >>> Sent: Tuesday, September 09, 2014 4:10 PM >>> To: dev@ctakes.apache.org >>> Subject: RE: Ctakes to process 5000K recoreds >>> >>> Thanks, let me try it. >>> Nick >>> >>> -----Original Message----- >>> From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] >>> Sent: Tuesday, September 09, 2014 4:08 PM >>> To: 'dev@ctakes.apache.org' >>> Subject: RE: Ctakes to process 5000K recoreds >>> >>> If you just need the medication names, you can remove these: >>> <node>ContextDependentTokenizerAnnotator</node> >>> <node>DependencyParser</node> >>> <node>AssertionAnnotator</node> >>> >>> You might be able to get rid of the LvgAnnotator and still get decent >>> results since variations of word form should not affect medication names. I >>> would try with it and without it on a smaller set of files and see if you >>> see a difference. >>> >>> I believe the others are needed by the default configs for medication >>> lookup. For example, POS is used to get phrase type. Phrases are used to >>> remove verb phrases from the lookup and also therefore to keep the lookup >>> windows from getting too big. I'm more familiar with the other types of >>> named entities (diseases, symptoms, etc) than with medications. >>> >>> -----Original Message----- >>> From: Nick Nikandish [mailto:snika...@emerginghealthit.com] >>> Sent: Tuesday, September 09, 2014 3:01 PM >>> To: dev@ctakes.apache.org >>> Subject: RE: Ctakes to process 5000K recoreds >>> >>> James, >>> >>> Do you have any suggestion about running cTakes with minimum annotators >>> that can return Medications in DictionaryLookupAnnotator? >>> Thanks, >>> Nick >>> >>> -----Original Message----- >>> From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] >>> Sent: Tuesday, September 09, 2014 3:05 PM >>> To: 'dev@ctakes.apache.org' >>> Subject: RE: Ctakes to process 5000K recoreds >>> >>> I suspect that when you take out simple segment annotated, nothing is >>> getting processed, and that is why it appears so fast. At least some of the >>> annotators loop through the list of sections/segments, which is why there >>> is a simple segment annotator - so that there is at least one >>> section/segment identified. Are you getting any annotations at all? >>> >>> -----Original Message----- >>> From: Nick Nikandish [mailto:snika...@emerginghealthit.com] >>> Sent: Tuesday, September 09, 2014 2:02 PM >>> To: dev@ctakes.apache.org >>> Subject: RE: Ctakes to process 5000K recoreds >>> >>> Pei, >>> I need the name of the medications for the application that I wrote and >>> uses ctakes.....so I cache the medication in DictionaryLookupAnnotator(in >>> performLookup()) and use them in my program but when I have >>> SimpleSegementAnnotator it just takes forever. After taking >>> SimpleSegementAnnotator out, no medication name in >>> DictionaryLookupAnnotator is returned in the code. So I was wondering if >>> there was a way that I could eliminate SimpleSegementAnnotator but still >>> be able to get the medications name in that class? >>> >>> Nick >>> >>> -----Original Message----- >>> From: Pei Chen [mailto:chen...@apache.org] >>> Sent: Tuesday, September 09, 2014 2:54 PM >>> To: dev@ctakes.apache.org >>> Subject: Re: Ctakes to process 5000K recoreds >>> >>> Nick, >>> When you mean no medication is being annotated, I presume you mean the >>> medication attributes (i.e. dosage, frequency, etc.) are not being >>> annotated? I think the DrugNER needs a list of section names in the >>> config; I think it includes SIMPLE_SEGMENT. I am very surprised that >>> SimpleSegementAnnotator is the bottle neck though; all it does is assume >>> the entire document is a single section called SIMPLE_SEGMENT. >>> Have you tried commenting out the DependencyParser if you're not using >>> those features. >>> >>> --Pei >>> >>> >>> On Tue, Sep 9, 2014 at 2:45 PM, Nick Nikandish < >>> snika...@emerginghealthit.com> wrote: >>>> >>>> Hi there, >>>> >>>> I am using Ctakes to process 5000K free text records where each record >>> has several medications. >>>> This is the fixed flow that it goes through: >>> <node>SimpleSegmentAnnotator</node> >>> <node>SentenceDetectorAnnotator</node> >>> <node>TokenizerAnnotator</node> >>> <node>LvgAnnotator</node> >>> <node>ContextDependentTokenizerAnnotator</node> >>> <node>POSTagger</node> >>> <node>Chunker</node> >>> <node>LookupWindowAnnotator</node> >>> <node>DictionaryLookupAnnotatorDB</node> >>> <node>DependencyParser</node> >>> <node>AssertionAnnotator</node> >>>> >>>> <node>ExtractionPrepAnnotator</node> >>>> >>>> But it takes very very long time to process that many data( maybe a week >>> or so) when I use SimpleSegmentAnnotator. By eliminating >>> SimpleSegmentAnnotator the process is very fast but no medication is being >>> anotated. Do you guys have any suggestion? >>>> >>>> Thanks, >>>> Nick >>>