Ctakes to process 5000K recoreds

2014-09-09 Thread Nick Nikandish
Hi there, I am using Ctakes to process 5000K free text records where each record has several medications. This is the fixed flow that it goes through: SimpleSegmentAnnotator

Re: Ctakes to process 5000K recoreds

2014-09-09 Thread Pei Chen
Nick, When you mean no medication is being annotated, I presume you mean the medication attributes (i.e. dosage, frequency, etc.) are not being annotated? I think the DrugNER needs a list of section names in the config; I think it includes SIMPLE_SEGMENT. I am very surprised that SimpleSegementAn

RE: Ctakes to process 5000K recoreds

2014-09-09 Thread Nick Nikandish
Pei, I need the name of the medications for the application that I wrote and uses ctakes.so I cache the medication in DictionaryLookupAnnotator(in performLookup()) and use them in my program but when I have SimpleSegementAnnotator it just takes forever. After taking SimpleSegementAnnotator

RE: Ctakes to process 5000K recoreds

2014-09-09 Thread Masanz, James J.
I suspect that when you take out simple segment annotated, nothing is getting processed, and that is why it appears so fast. At least some of the annotators loop through the list of sections/segments, which is why there is a simple segment annotator - so that there is at least one section/segmen

RE: Ctakes to process 5000K recoreds

2014-09-09 Thread Nick Nikandish
I am only interested in medications names so I use cTakes for that sole purpose for now(the future plan is to use other parts of cTakes) . I don't believe I am getting any annotation either. If I only want to use identify the medication/antibiotics name in a text like this: Urine culture

RE: Ctakes to process 5000K recoreds

2014-09-09 Thread Nick Nikandish
James, Do you have any suggestion about running cTakes with minimum annotators that can return Medications in DictionaryLookupAnnotator? Thanks, Nick -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Tuesday, September 09, 2014 3:05 PM To: 'dev@ctakes.apach

RE: Ctakes to process 5000K recoreds

2014-09-09 Thread Masanz, James J.
If you just need the medication names, you can remove these: ContextDependentTokenizerAnnotator DependencyParser AssertionAnnotator You might be able to get rid of the LvgAnnotator and still get decent results since variations of word form should not affect medication names. I would try with

RE: Ctakes to process 5000K recoreds

2014-09-09 Thread Nick Nikandish
Thanks, let me try it. Nick -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Tuesday, September 09, 2014 4:08 PM To: 'dev@ctakes.apache.org' Subject: RE: Ctakes to process 5000K recoreds If you just need the medication names, you can remove these: ContextDe

RE: Ctakes to process 5000K recoreds

2014-09-09 Thread Finan, Sean
Hi Nick, I think that the bottleneck is probably the lookup module itself. So, I just sent you a secure email/ftp link. It contains a build of the new dictionary-lookup-fast module. Should you choose to try it, let me know how things turn out. Sean F

RE: Ctakes to process 5000K recoreds

2014-09-09 Thread Nick Nikandish
Hi Sean, Many thanks, I will try it tomorrow. Do you have any special instruction to run that scrip or I have to use it with cTakes? Thanks, Nick -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Tuesday, September 09, 2014 4:24 PM To: dev@ctakes.apa

RE: Ctakes to process 5000K recoreds

2014-09-09 Thread Finan, Sean
Just use it with cTakes. Instead of removing other modules from the pipeline, replace the dictionary-lookup with dictionary-lookup-fast. For the desc/ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml , you would modify: To be:

RE: Ctakes to process 5000K recoreds

2014-09-09 Thread Nick Nikandish
Great. I will do that. Thanks again. Nick -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Tuesday, September 09, 2014 4:39 PM To: dev@ctakes.apache.org Subject: RE: Ctakes to process 5000K recoreds Just use it with cTakes. Instead of removing other

Re: Ctakes to process 5000K recoreds

2014-09-09 Thread Bruce Tietjen
Sean, If that is a script for generating a dictionary for use with dictionary-lookup-fast, I would also be very interested in checking it out. Thanks, Bruce [image: IMAT Solutions] Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@i

RE: Ctakes to process 5000K recoreds

2014-09-09 Thread Finan, Sean
There is a tool to generate a dictionary in the new format using the UMLS MR*** files. The module can also read directly from a file with bar-separated values: CUI|Text or CUI|TUI|Text which could be useful for small custom dictionaries. I can send a copy of the dictionary creator jar and in

Re: Ctakes to process 5000K recoreds

2014-09-09 Thread Chen, Pei
Sean- Aren't the scripts to generate the DB already available in the sandbox area? Sent from my iPhone > On Sep 9, 2014, at 5:24 PM, "Finan, Sean" > wrote: > > There is a tool to generate a dictionary in the new format using the UMLS > MR*** files. > > The module can also read directly f

RE: Ctakes to process 5000K recoreds

2014-09-09 Thread Finan, Sean
Yes, the code is in the sandbox. From: Chen, Pei [pei.c...@childrens.harvard.edu] Sent: Tuesday, September 09, 2014 5:26 PM To: Subject: Re: Ctakes to process 5000K recoreds Sean- Aren't the scripts to generate the DB already available in the sandbox are

Re: Ctakes to process 5000K recoreds

2014-09-09 Thread Chen, Pei
(Trying to avoid passing individual jars via email) Sent from my iPhone > On Sep 9, 2014, at 5:26 PM, "Chen, Pei" > wrote: > > Sean- > Aren't the scripts to generate the DB already available in the sandbox area? > > Sent from my iPhone > >> On Sep 9, 2014, at 5:24 PM, "Finan, Sean" >> wr

Recommendation for ctakes default (UMLS) dictionaries

2014-09-09 Thread andy mcmurry
Greetings ctakes-dev: *UMLS license restrictions have been getting more lax over the years -- *much of the UMLS can be downloaded directly from the NCBI official FTP site. In fact, the NIH (and implicitly the NLM) *have already made the standard terms public for some medical specialities*. For

RE: Ctakes to process 5000K recoreds

2014-09-09 Thread Finan, Sean
>(Trying to avoid passing individual jars via email) Understood. I sent the latest (Saturday) build of the dictionary module that I haven't yet checked in. Its dictionary format is incompatible with the format produced by the creator in sandbox. I will check in all of the code changes once I

Re: to map UMLS CUI with normalized form

2014-09-09 Thread andy mcmurry
For bioinformatics folks *Preferred disease names are already mapped by the NCBI* ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/disease_names On Mon, Sep 1, 2014 at 2:32 PM, Peter Szolovits wrote: > A single CUI may have many different preferred names in different > vocabularies. If you have a m