Re: Dictionary in cTAKES

pratik agarwal Sat, 17 Dec 2016 06:44:57 -0800

Thanks Sean for the reply. You were right. The cTakesHsql.xml was being
used because I was calling the getFastPipeline method, which uses
*AnalysisEngineFactory.**createEngineDescription() *where the file resource
is by default mentioned to be cTakesHsql.xml. Now, to respond to how am I
using cTakes:


1. I am using cTAKES in IntelliJ IDEA. I downloaded the user version and
added the compiled binaries to my classpath. The *OntologyConceptUtil.java*
was not in there, so I separately downloaded it from trunk and added that
too to my classpath.

2. I wrote a simple java code that calls the *getFastPipeline()* method
from the *ClinicalPipelineFactory *class.  and then use OntologyConceptUtil.
getSchemeCodes(*Identified Annotation object) *to get the codes in a
document. So, when I was using it with the original cTakesHsql.xml with
ctakessnorx.script and ctakessnorx.properties as the sources used, I was
getting the codes like :

Entity: coenzyme Q10=== codes:* {RXNORM=[21406], SNOMEDCT=[412129003,
412130008]}*

I was also getting the ICD-9 codes, like:


*wound opening=== codes: {ICD9CM=[870-897.99], SNOMEDCT=[269362000,
59091005, 125643001, 157351009, 157439006, 269347002]}*

but when I modified the cTakesHsql.xml as you mentioned here:
https://www.mail-archive.com/dev@ctakes.apache.org/msg02597.html

Note: I did change
value="jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx/ctakessnorx"/>

to
value="jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/ctakesicd2015"/>


I got:
*wound opening=== codes:{}*


So I tried changing <property key="*icd9cm_2014Table*"
value="*text*"/> to <property
key="*icd9cm_2014Table*" value="*ic**d9cm*"/> and I got:

*wound opening=== codes:{**ICD9CM=[870-897.99]**}.* But no success with
ICD10CM or RXNORM.

Then I followed the process that was mentioned in the document you sent. I
managed to get the .script, .properties, .rc file and the .xml file as
expected. But even with that xml file, I'm simply getting empty values
printed:

*wound opening=== codes:{}*

It would be really great if you can help me understand what I might be
doing wrong.

Thanks and Best Regards.
Pratik Agarwal



On Fri, Dec 16, 2016 at 7:57 PM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Pratik,
>
>
>
> How are you running ctakes?  If you are running it using the older uima
> style then editing the descriptor files (*Annotator.xml) as you have done
> should work.  If you are running it with a UimaFit class or a piper file
> then you will need to redirect to your custom dictionary config .xml in
> another manner.  The pains of progress …
>
>
>
> 1.        Let me know how you launch ctakes.
>
> a.       If you are launching by directly running a class then you will
> need to override a default parameter functionally.  Edit your call
>
> “AnalysisEngineFactory.createEngineDescription(
> DefaultJCasTermAnnotator.class );”
>
> And add “,JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY, [my].xml” to the
> call right after “.class”.  Note the comma.
>
> b.      If you are running with the DefaultFastPipeline.piper file (used
> by bin/runClinicalPipeline) then you can edit the piper file and add a line
> with “addParameters DictionaryDescriptor=[my].xml”.  Add it above the
> line “add DefaultJCasTermAnnotator”.  If you updated trunk within the last
> few days you can use “set” in place of “addParameters”.  The
> DefaultFastPipeline.piper is in resources/org/apache/ctakes/
> clinical/pipeline/piper/
>
> 2.       “key” and “value” indicate the name of a vocabulary table in the
> dictionary database and the datatype of the code values within that table.
> It looks like all of your snomed and rxnorms were able to be stored as
> “long”, but the other vocabularies had at least one character or two
> decimals so they required “text”.
>
> a.       All of the table names in the database are those listed as keys
> but without the “Table” suffix.  For instance, yours are
> “snomedct_us_2016_09_01” , “rxnorm_16aa_160906f” and so forth.
>
> b.      You don’t need to change any named contants (*CODING_CHEME=*) in
> the code to fetch your data.
>
> If you are getting codes in the CPE then they should be available under
> OntologyConceptArray.
>
> If you are getting codes programmatically then use the class
> OntologyConceptUtil.  It has a tonne of methods that can be used to obtain
> codes, for the entire document, for certain sections, for individual
> annotations, etc.
>
>
>
> I hope that the above is clear.  I will try to add all of this to some
> documentation asap and make it available publicly.  Don’t anybody hold your
> breath though …
>
>
>
> Sean
>
>
>
>
>
> TODO SPF
>
>
>
> *From:* pratik agarwal [mailto:pratikagarwal2...@gmail.com]
> *Sent:* Friday, December 16, 2016 3:40 AM
> *To:* Finan, Sean
> *Cc:* dev@ctakes.apache.org
> *Subject:* Re: Dictionary in cTAKES
>
>
>
> Thanks Sean and Nishant for the help. Sean, the document you sent was
> really helpful. I was able to successfully create a dictionary using the
> dictionary-gui. But I'm still not able to use the dictionary. It would be
> great if you could help me out.
>
>
>
> I got a .script file, a .properties file, a .rc file and a .xml file on
> running the dictionary-gui as Sean mentioned here:
>
> http://mail-archives.apache.org/mod_mbox/ctakes-dev/
> 201601.mbox/%3CCA+jqmuyBcv-h67bxg=gummpVkE_khOXpSfRvSqx=
> jk3pzz7...@mail.gmail.com%3E
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201601.mbox_-253CCA-2BjqmuyBcv-2Dh67bxg-3DgummpVkE-5FkhOXpSfRvSqx-3DjK3pzZ7WGA-40mail.gmail.com-253E&d=DgMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=p70NxcbO486VuKQBsHbtqJgTuufOpYLf7I2B_6sJMY0&s=WnxjzqXpPeLEVlgbJ2qea8Z1rdQjih37ci-zwB9rIE4&e=>
>
>
>
> Then I changed the *file url* in both *UmlsLookupAnnotator.xml* and 
> *UmlsOverlapLookupAnnotator.xml
> *from cTakesHsql.xml to [new xml file name].xml  in the directory [cTAKES
> root]/desc/ctakes-dictionary-lookup-fast/desc/analysis_engine/:
>
>
>
> Here's the part where I changed it:
>
>             *<name>**DictionaryDescriptorFile**</name>
>
>   *
>
> *            <description/>
>
>                    *
>
> *            <fileResourceSpecifier>
>
>             *
>
> *                  
> <fileUrl>**file:org/apache/ctakes/dictionary/lookup/fast/[new
> xml file name].xml**</fileUrl>              *
>
> *            </fileResourceSpecifier>
>
>              *
>
> *            <implementationName>*
> *org.apache.ctakes.core.resource.FileResourceImpl**</implementationName> *
>
>
>
> But when I run the program, in the line describing the dictionary resource
> used, I see that the cTakesHsql.xml is still being used instead of the new
> one. Here is what it looks like:
>
>
>
> *INFO DictionaryDescriptorParser - Parsing dictionary specifications:
> /home/pratik/Desktop/cTAKES/out/production/cTAKES/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml*
>
>
>
> Another issue I'm facing is, even when I simply replace the contents of
> cTakesHsql.xml with the contents of the new xml file, it's not returning
> any codes (*ICD,RXNORM etc.*), although the original cTakesHsql.xml was
> returning a few codes. I have a feeling this has to do with the *keys *and*
> values* in:
>
>
>
>             <property key="*snomedct_us_2016_09_01Table*" value="*long*"/>
>
>             <property key="*rxnorm_16aa_160906fTable*" value="*long*"/>
>
>             <property key="*icd10pcs_2017Table*" value="*text*"/>
>
>             <property key="*icd10cm_2017Table*" value="*text*"/>
>
>             <property key="*icd9cm_2014Table*" value="*text*"/>
>
>
>
> Can you please guide me on the following 2 questions:
>
>
>
> 1. Where do I need to change the resource xml file location, to make
> cTAKES use my custom dictionary instead of the default one.
>
>
>
> 2. What do the *key *and *value *above actually correspond to? Do I need
> to make any changes to it? I saw a lot of class files that contain terms
> like "RXNORM", "SNOMEDCT", "ICD9CM" etc. Do I need to make any changes in
> those files too?
>
> For example, in "IdentifiedAnnotation.class", I can see lines like:
>
>
>
> private static final Logger LOGGER = 
> Logger.getLogger("IdentifiedAnnotationUtil");
>
> public static final String CTAKES_SNOMED_CODING_SCHEME = "SNOMED";
>
> public static final String CTAKES_RXNORM_CODING_SCHEME = "RXNORM";
>
>
>
>
>
> Thanks and Best Regards
>
> Pratik Agarwal
>
>
>
>
>
> On Tue, Dec 6, 2016 at 8:19 PM, Finan, Sean <Sean.Finan@childrens.harvard.
> edu> wrote:
>
> Hi Pratik,
>
> It sounds like you are running using code from trunk.  That is good.
>
> I have attached a document that outlines how you can use a dictionary
> creator gui to make a database with any umls source vocabulary that you
> need.  That would be section 6.1.  It also outlines use of a bsv (similar
> to csv) file in section 6.2.
>
> Please provide questions and feedback as I would like to improve this
> document before making it public on the ctakes website.
>
> Some brief information on how the fast dictionary lookup works can be
> viewed here: https://cwiki.apache.org/confluence/display/CTAKES/
> cTAKES+3.2+-+Fast+Dictionary+Lookup
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLookup&d=DgMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=p70NxcbO486VuKQBsHbtqJgTuufOpYLf7I2B_6sJMY0&s=YLf08s5AAkmYnEHjwcs47pSnYq41MZMfFmuBewtqnzQ&e=>
>
> Sean
>
>
>
>
>
> -----Original Message-----
> From: pratik agarwal [mailto:pratikagarwal2...@gmail.com]
> Sent: Monday, December 05, 2016 6:21 AM
> To: u...@ctakes.apache.org
> Subject: Fwd: Dictionary in cTAKES
>
> Hi everyone
>
> I came across cTAKES fairly recently and I'm facing some difficulties with
> understanding the working of it. I am required to map clinical text notes
> with the ICD-10-CM and CPT/HCPCS codes. From what I read, or tried, the
> default dictionaries used with the fast pipeline are SNOMEDCT, RXNORM and
> ICD9CM.
>
> I am currently trying to work with the user version of cTAKES in Intellij
> IDEA with Java Oracle JDK 8.
>
> It would be great if someone could help me out. I am really sorry if this
> is too easy a problem, but I've been trying to solve it for a while and I'm
> stuck.
>
> I was able to extract ICD9CM codes from cTAKES with the default resources
> i.e. ctakesnorx.properties and ctakesnorx.script
>
> I wanted to get ICD10CM and ICD10PCS codes, so I downloaded .script and
> .properties file from this source:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_
> ctakesresources_code_HEAD_&d=DgIBaQ&c=qS4goWBT7poplM69zy_
> 3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=
> bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=
> MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e=
> tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_ctakesresources_code_HEAD_&d=DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e=tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/>
> src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/
>
> and made corresponding changes to the cTakesHsql.xml file as mentioned by
> Sean in:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.
> mail-2Darchive.com_dev-40ctakes.apache.org_msg02597.html&d=DgIBaQ&c=
> qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=
> bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=
> X84zAxr2pdhp4sOWuOQ1wfpEGCLtD9s16dB7DaTghc0&e=
>
>
> But this doesn't seem to work. I played around a bit with the parameters
> in the following lines:
>
>     <property key="snomedct_usTable" value="long"/>
>         <property key="rxnormTable" value="text"/>
>         <property key="icd9cmTable" value="text"/>
>         <property key="icd10pcsTable" value="text"/>
>
> Basically when I was getting blank outputs after making the change.
> I am using OntologyConceptUtil.getSchemeCodes(JCas) for getting the
> outputs.
>
> I was getting an error with rxnormTable. So I commented that line out. and
> after that I was getting blank output. So I tried replacing value="text"
> with value = "icd9cm" for key = "icd9cmTable" and it started returning
> ICD9CM codes. But I couldn't get anything when I did the same with
> ICD10PCS. I again got a blank output.
>
> Note: I did all this after commenting:
>
>         <property key="snomedTable" value="snomedct"/>
>         <property key="rxnormTable" value="rxnorm"/>
>         <property key="icd9Table" value="icd9cm"/>
>         <property key="icd10Table" value="icd10pcs"/>
>
>
> It would be great if someone could help me understand how the dictionary
> mechanism is working. Also, how to get ICD10CM codes and ICD10PCS codes
> from this.
>
> (i) What are the keys and values mentioned above and where can I find
> these in the script or properties file? Is there a way I can access these?
> Please help me understand how this is working.
>
> (ii) I have a csv file containing the ICD codes with the code in Column 1
> and description in Column 2 and similarly for CPT/HCPCS codes. What are the
> steps I need to take to make it work with OntologyConceptUtil.
> getSchemeCodes(JCas).
>
>
> I saw from different forums that we can use dictionary-gui tool from
> sandbox. But I am not really understanding which files do I need to run in
> that folder. Also, where in the project tree should I place this folder to
> make it run. Also, what are the parameters required and where do I change
> them, if any.
>
> Thanks a lot.
>
> Regards,
> Pratik Agarwal
>
>
>

Re: Dictionary in cTAKES

Reply via email to