Re: Dictionary in cTAKES

pratik agarwal Fri, 16 Dec 2016 00:41:09 -0800

Thanks Sean and Nishant for the help. Sean, the document you sent was
really helpful. I was able to successfully create a dictionary using the
dictionary-gui. But I'm still not able to use the dictionary. It would be
great if you could help me out.


I got a .script file, a .properties file, a .rc file and a .xml file on
running the dictionary-gui as Sean mentioned here:
http://mail-archives.apache.org/mod_mbox/ctakes-dev/
201601.mbox/%3CCA+jqmuyBcv-h67bxg=gummpVkE_khOXpSfRvSqx=
jk3pzz7...@mail.gmail.com%3E

Then I changed the *file url* in both *UmlsLookupAnnotator.xml* and
*UmlsOverlapLookupAnnotator.xml
*from cTakesHsql.xml to [new xml file name].xml  in the directory [cTAKES
root]/desc/ctakes-dictionary-lookup-fast/desc/analysis_engine/:

Here's the part where I changed it:
            *<name>**DictionaryDescriptorFile**</name>
                                                                          *
*            <description/>

                 *
*            <fileResourceSpecifier>

            *
*
<fileUrl>**file:org/apache/ctakes/dictionary/lookup/fast/[new
xml file name].xml**</fileUrl>              *
*            </fileResourceSpecifier>

           *
*            <implementationName>*
*org.apache.ctakes.core.resource.FileResourceImpl**</implementationName> *

But when I run the program, in the line describing the dictionary resource
used, I see that the cTakesHsql.xml is still being used instead of the new
one. Here is what it looks like:

*INFO DictionaryDescriptorParser - Parsing dictionary specifications:
/home/pratik/Desktop/cTAKES/out/production/cTAKES/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml*

Another issue I'm facing is, even when I simply replace the contents of
cTakesHsql.xml with the contents of the new xml file, it's not returning
any codes (*ICD,RXNORM etc.*), although the original cTakesHsql.xml was
returning a few codes. I have a feeling this has to do with the *keys *and*
values* in:

            <property key="*snomedct_us_2016_09_01Table*" value="*long*"/>
            <property key="*rxnorm_16aa_160906fTable*" value="*long*"/>
            <property key="*icd10pcs_2017Table*" value="*text*"/>
            <property key="*icd10cm_2017Table*" value="*text*"/>
            <property key="*icd9cm_2014Table*" value="*text*"/>

Can you please guide me on the following 2 questions:

1. Where do I need to change the resource xml file location, to make cTAKES
use my custom dictionary instead of the default one.

2. What do the *key *and *value *above actually correspond to? Do I need to
make any changes to it? I saw a lot of class files that contain terms like
"RXNORM", "SNOMEDCT", "ICD9CM" etc. Do I need to make any changes in those
files too?
For example, in "IdentifiedAnnotation.class", I can see lines like:

private static final Logger LOGGER =
Logger.getLogger("IdentifiedAnnotationUtil");

public static final String CTAKES_SNOMED_CODING_SCHEME = "SNOMED";

public static final String CTAKES_RXNORM_CODING_SCHEME = "RXNORM";


Thanks and Best Regards
Pratik Agarwal


On Tue, Dec 6, 2016 at 8:19 PM, Finan, Sean <Sean.Finan@childrens.harvard.
edu> wrote:

> Hi Pratik,
>
> It sounds like you are running using code from trunk.  That is good.
>
> I have attached a document that outlines how you can use a dictionary
> creator gui to make a database with any umls source vocabulary that you
> need.  That would be section 6.1.  It also outlines use of a bsv (similar
> to csv) file in section 6.2.
>
> Please provide questions and feedback as I would like to improve this
> document before making it public on the ctakes website.
>
> Some brief information on how the fast dictionary lookup works can be
> viewed here: https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.
> 2+-+Fast+Dictionary+Lookup
>
> Sean
>
>
>
>
>
> -----Original Message-----
> From: pratik agarwal [mailto:pratikagarwal2...@gmail.com]
> Sent: Monday, December 05, 2016 6:21 AM
> To: u...@ctakes.apache.org
> Subject: Fwd: Dictionary in cTAKES
>
> Hi everyone
>
> I came across cTAKES fairly recently and I'm facing some difficulties with
> understanding the working of it. I am required to map clinical text notes
> with the ICD-10-CM and CPT/HCPCS codes. From what I read, or tried, the
> default dictionaries used with the fast pipeline are SNOMEDCT, RXNORM and
> ICD9CM.
>
> I am currently trying to work with the user version of cTAKES in Intellij
> IDEA with Java Oracle JDK 8.
>
> It would be great if someone could help me out. I am really sorry if this
> is too easy a problem, but I've been trying to solve it for a while and I'm
> stuck.
>
> I was able to extract ICD9CM codes from cTAKES with the default resources
> i.e. ctakesnorx.properties and ctakesnorx.script
>
> I wanted to get ICD10CM and ICD10PCS codes, so I downloaded .script and
> .properties file from this source:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__sourcef
> orge.net_p_ctakesresources_code_HEAD_&d=DgIBaQ&c=
> qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpy
> IisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvj
> slf1QfJGV0SxFK4&s=MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e=
> tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_ctakesresources_code_HEAD_&d=DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e=tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/>
> src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/
>
> and made corresponding changes to the cTakesHsql.xml file as mentioned by
> Sean in:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mai
> l-2Darchive.com_dev-40ctakes.apache.org_msg02597.html&d=
> DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67G
> vlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAH
> AYAn6pAvjslf1QfJGV0SxFK4&s=X84zAxr2pdhp4sOWuOQ1wfpEGCLtD9s16dB7DaTghc0&e=
>
> But this doesn't seem to work. I played around a bit with the parameters
> in the following lines:
>
>     <property key="snomedct_usTable" value="long"/>
>         <property key="rxnormTable" value="text"/>
>         <property key="icd9cmTable" value="text"/>
>         <property key="icd10pcsTable" value="text"/>
>
> Basically when I was getting blank outputs after making the change.
> I am using OntologyConceptUtil.getSchemeCodes(JCas) for getting the
> outputs.
>
> I was getting an error with rxnormTable. So I commented that line out. and
> after that I was getting blank output. So I tried replacing value="text"
> with value = "icd9cm" for key = "icd9cmTable" and it started returning
> ICD9CM codes. But I couldn't get anything when I did the same with
> ICD10PCS. I again got a blank output.
>
> Note: I did all this after commenting:
>
>         <property key="snomedTable" value="snomedct"/>
>         <property key="rxnormTable" value="rxnorm"/>
>         <property key="icd9Table" value="icd9cm"/>
>         <property key="icd10Table" value="icd10pcs"/>
>
>
> It would be great if someone could help me understand how the dictionary
> mechanism is working. Also, how to get ICD10CM codes and ICD10PCS codes
> from this.
>
> (i) What are the keys and values mentioned above and where can I find
> these in the script or properties file? Is there a way I can access these?
> Please help me understand how this is working.
>
> (ii) I have a csv file containing the ICD codes with the code in Column 1
> and description in Column 2 and similarly for CPT/HCPCS codes. What are the
> steps I need to take to make it work with OntologyConceptUtil.getSchemeC
> odes(JCas).
>
>
> I saw from different forums that we can use dictionary-gui tool from
> sandbox. But I am not really understanding which files do I need to run in
> that folder. Also, where in the project tree should I place this folder to
> make it run. Also, what are the parameters required and where do I change
> them, if any.
>
> Thanks a lot.
>
> Regards,
> Pratik Agarwal
>

Re: Dictionary in cTAKES

Reply via email to