RE: Dictionary in cTAKES

Finan, Sean Fri, 16 Dec 2016 06:28:04 -0800

Hi Pratik,

How are you running ctakes?  If you are running it using the older uima style 
then editing the descriptor files (*Annotator.xml) as you have done should 
work.  If you are running it with a UimaFit class or a piper file then you will 
need to redirect to your custom dictionary config .xml in another manner.  The 
pains of progress …



1.        Let me know how you launch ctakes.

a.       If you are launching by directly running a class then you will need to 
override a default parameter functionally.  Edit your call

“AnalysisEngineFactory.createEngineDescription( DefaultJCasTermAnnotator.class 
);”

And add “,JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY, [my].xml” to the call 
right after “.class”.  Note the comma.

b.      If you are running with the DefaultFastPipeline.piper file (used by 
bin/runClinicalPipeline) then you can edit the piper file and add a line with 
“addParameters DictionaryDescriptor=[my].xml”.  Add it above the line “add 
DefaultJCasTermAnnotator”.  If you updated trunk within the last few days you 
can use “set” in place of “addParameters”.  The DefaultFastPipeline.piper is in 
resources/org/apache/ctakes/clinical/pipeline/piper/

2.       “key” and “value” indicate the name of a vocabulary table in the 
dictionary database and the datatype of the code values within that table.  It 
looks like all of your snomed and rxnorms were able to be stored as “long”, but 
the other vocabularies had at least one character or two decimals so they 
required “text”.

a.       All of the table names in the database are those listed as keys but 
without the “Table” suffix.  For instance, yours are “snomedct_us_2016_09_01” , 
“rxnorm_16aa_160906f” and so forth.

b.      You don’t need to change any named contants (*CODING_CHEME=*) in the 
code to fetch your data.

If you are getting codes in the CPE then they should be available under 
OntologyConceptArray.

If you are getting codes programmatically then use the class 
OntologyConceptUtil.  It has a tonne of methods that can be used to obtain 
codes, for the entire document, for certain sections, for individual 
annotations, etc.

I hope that the above is clear.  I will try to add all of this to some 
documentation asap and make it available publicly.  Don’t anybody hold your 
breath though …

Sean


TODO SPF

From: pratik agarwal [mailto:pratikagarwal2...@gmail.com]
Sent: Friday, December 16, 2016 3:40 AM
To: Finan, Sean
Cc: dev@ctakes.apache.org
Subject: Re: Dictionary in cTAKES

Thanks Sean and Nishant for the help. Sean, the document you sent was really 
helpful. I was able to successfully create a dictionary using the 
dictionary-gui. But I'm still not able to use the dictionary. It would be great 
if you could help me out.

I got a .script file, a .properties file, a .rc file and a .xml file on running 
the dictionary-gui as Sean mentioned here:
http://mail-archives.apache.org/mod_mbox/ctakes-dev/201601.mbox/%3CCA+jqmuyBcv-h67bxg=gummpVkE_khOXpSfRvSqx=jk3pzz7...@mail.gmail.com%3E<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201601.mbox_-253CCA-2BjqmuyBcv-2Dh67bxg-3DgummpVkE-5FkhOXpSfRvSqx-3DjK3pzZ7WGA-40mail.gmail.com-253E&d=DgMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=p70NxcbO486VuKQBsHbtqJgTuufOpYLf7I2B_6sJMY0&s=WnxjzqXpPeLEVlgbJ2qea8Z1rdQjih37ci-zwB9rIE4&e=>

Then I changed the file url in both UmlsLookupAnnotator.xml and 
UmlsOverlapLookupAnnotator.xml from cTakesHsql.xml to [new xml file name].xml  
in the directory [cTAKES 
root]/desc/ctakes-dictionary-lookup-fast/desc/analysis_engine/:

Here's the part where I changed it:
            <name>DictionaryDescriptorFile</name>
            <description/>
            <fileResourceSpecifier>
                  <fileUrl>file:org/apache/ctakes/dictionary/lookup/fast/[new 
xml file name].xml</fileUrl>
            </fileResourceSpecifier>
            
<implementationName>org.apache.ctakes.core.resource.FileResourceImpl</implementationName>

But when I run the program, in the line describing the dictionary resource 
used, I see that the cTakesHsql.xml is still being used instead of the new one. 
Here is what it looks like:

INFO DictionaryDescriptorParser - Parsing dictionary specifications: 
/home/pratik/Desktop/cTAKES/out/production/cTAKES/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml

Another issue I'm facing is, even when I simply replace the contents of 
cTakesHsql.xml with the contents of the new xml file, it's not returning any 
codes (ICD,RXNORM etc.), although the original cTakesHsql.xml was returning a 
few codes. I have a feeling this has to do with the keys and values in:

            <property key="snomedct_us_2016_09_01Table" value="long"/>
            <property key="rxnorm_16aa_160906fTable" value="long"/>
            <property key="icd10pcs_2017Table" value="text"/>
            <property key="icd10cm_2017Table" value="text"/>
            <property key="icd9cm_2014Table" value="text"/>

Can you please guide me on the following 2 questions:

1. Where do I need to change the resource xml file location, to make cTAKES use 
my custom dictionary instead of the default one.

2. What do the key and value above actually correspond to? Do I need to make 
any changes to it? I saw a lot of class files that contain terms like "RXNORM", 
"SNOMEDCT", "ICD9CM" etc. Do I need to make any changes in those files too?
For example, in "IdentifiedAnnotation.class", I can see lines like:


private static final Logger LOGGER = 
Logger.getLogger("IdentifiedAnnotationUtil");

public static final String CTAKES_SNOMED_CODING_SCHEME = "SNOMED";
public static final String CTAKES_RXNORM_CODING_SCHEME = "RXNORM";


Thanks and Best Regards
Pratik Agarwal


On Tue, Dec 6, 2016 at 8:19 PM, Finan, Sean 
<sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu>> 
wrote:
Hi Pratik,

It sounds like you are running using code from trunk.  That is good.

I have attached a document that outlines how you can use a dictionary creator 
gui to make a database with any umls source vocabulary that you need.  That 
would be section 6.1.  It also outlines use of a bsv (similar to csv) file in 
section 6.2.

Please provide questions and feedback as I would like to improve this document 
before making it public on the ctakes website.

Some brief information on how the fast dictionary lookup works can be viewed 
here: 
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2+-+Fast+Dictionary+Lookup<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLookup&d=DgMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=p70NxcbO486VuKQBsHbtqJgTuufOpYLf7I2B_6sJMY0&s=YLf08s5AAkmYnEHjwcs47pSnYq41MZMfFmuBewtqnzQ&e=>

Sean





-----Original Message-----
From: pratik agarwal 
[mailto:pratikagarwal2...@gmail.com<mailto:pratikagarwal2...@gmail.com>]
Sent: Monday, December 05, 2016 6:21 AM
To: u...@ctakes.apache.org<mailto:u...@ctakes.apache.org>
Subject: Fwd: Dictionary in cTAKES

Hi everyone

I came across cTAKES fairly recently and I'm facing some difficulties with 
understanding the working of it. I am required to map clinical text notes with 
the ICD-10-CM and CPT/HCPCS codes. From what I read, or tried, the default 
dictionaries used with the fast pipeline are SNOMEDCT, RXNORM and ICD9CM.

I am currently trying to work with the user version of cTAKES in Intellij IDEA 
with Java Oracle JDK 8.

It would be great if someone could help me out. I am really sorry if this is 
too easy a problem, but I've been trying to solve it for a while and I'm stuck.

I was able to extract ICD9CM codes from cTAKES with the default resources i.e. 
ctakesnorx.properties and ctakesnorx.script

I wanted to get ICD10CM and ICD10PCS codes, so I downloaded .script and 
.properties file from this source:

https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_ctakesresources_code_HEAD_&d=DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e=
tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/
src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/

and made corresponding changes to the cTakesHsql.xml file as mentioned by Sean 
in:

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_dev-40ctakes.apache.org_msg02597.html&d=DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=X84zAxr2pdhp4sOWuOQ1wfpEGCLtD9s16dB7DaTghc0&e=

But this doesn't seem to work. I played around a bit with the parameters in the 
following lines:

    <property key="snomedct_usTable" value="long"/>
        <property key="rxnormTable" value="text"/>
        <property key="icd9cmTable" value="text"/>
        <property key="icd10pcsTable" value="text"/>

Basically when I was getting blank outputs after making the change.
I am using OntologyConceptUtil.getSchemeCodes(JCas) for getting the outputs.

I was getting an error with rxnormTable. So I commented that line out. and 
after that I was getting blank output. So I tried replacing value="text"
with value = "icd9cm" for key = "icd9cmTable" and it started returning ICD9CM 
codes. But I couldn't get anything when I did the same with ICD10PCS. I again 
got a blank output.

Note: I did all this after commenting:

        <property key="snomedTable" value="snomedct"/>
        <property key="rxnormTable" value="rxnorm"/>
        <property key="icd9Table" value="icd9cm"/>
        <property key="icd10Table" value="icd10pcs"/>


It would be great if someone could help me understand how the dictionary 
mechanism is working. Also, how to get ICD10CM codes and ICD10PCS codes from 
this.

(i) What are the keys and values mentioned above and where can I find these in 
the script or properties file? Is there a way I can access these? Please help 
me understand how this is working.

(ii) I have a csv file containing the ICD codes with the code in Column 1 and 
description in Column 2 and similarly for CPT/HCPCS codes. What are the steps I 
need to take to make it work with OntologyConceptUtil.getSchemeCodes(JCas).


I saw from different forums that we can use dictionary-gui tool from sandbox. 
But I am not really understanding which files do I need to run in that folder. 
Also, where in the project tree should I place this folder to make it run. 
Also, what are the parameters required and where do I change them, if any.

Thanks a lot.

Regards,
Pratik Agarwal

RE: Dictionary in cTAKES

Reply via email to