Thanks Sean and Nishant for the help. Sean, the document you sent was really helpful. I was able to successfully create a dictionary using the dictionary-gui. But I'm still not able to use the dictionary. It would be great if you could help me out.
I got a .script file, a .properties file, a .rc file and a .xml file on running the dictionary-gui as Sean mentioned here: http://mail-archives.apache.org/mod_mbox/ctakes-dev/ 201601.mbox/%3CCA+jqmuyBcv-h67bxg=gummpVkE_khOXpSfRvSqx= jk3pzz7...@mail.gmail.com%3E Then I changed the *file url* in both *UmlsLookupAnnotator.xml* and *UmlsOverlapLookupAnnotator.xml *from cTakesHsql.xml to [new xml file name].xml in the directory [cTAKES root]/desc/ctakes-dictionary-lookup-fast/desc/analysis_engine/: Here's the part where I changed it: *<name>**DictionaryDescriptorFile**</name> * * <description/> * * <fileResourceSpecifier> * * <fileUrl>**file:org/apache/ctakes/dictionary/lookup/fast/[new xml file name].xml**</fileUrl> * * </fileResourceSpecifier> * * <implementationName>* *org.apache.ctakes.core.resource.FileResourceImpl**</implementationName> * But when I run the program, in the line describing the dictionary resource used, I see that the cTakesHsql.xml is still being used instead of the new one. Here is what it looks like: *INFO DictionaryDescriptorParser - Parsing dictionary specifications: /home/pratik/Desktop/cTAKES/out/production/cTAKES/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml* Another issue I'm facing is, even when I simply replace the contents of cTakesHsql.xml with the contents of the new xml file, it's not returning any codes (*ICD,RXNORM etc.*), although the original cTakesHsql.xml was returning a few codes. I have a feeling this has to do with the *keys *and* values* in: <property key="*snomedct_us_2016_09_01Table*" value="*long*"/> <property key="*rxnorm_16aa_160906fTable*" value="*long*"/> <property key="*icd10pcs_2017Table*" value="*text*"/> <property key="*icd10cm_2017Table*" value="*text*"/> <property key="*icd9cm_2014Table*" value="*text*"/> Can you please guide me on the following 2 questions: 1. Where do I need to change the resource xml file location, to make cTAKES use my custom dictionary instead of the default one. 2. What do the *key *and *value *above actually correspond to? Do I need to make any changes to it? I saw a lot of class files that contain terms like "RXNORM", "SNOMEDCT", "ICD9CM" etc. Do I need to make any changes in those files too? For example, in "IdentifiedAnnotation.class", I can see lines like: private static final Logger LOGGER = Logger.getLogger("IdentifiedAnnotationUtil"); public static final String CTAKES_SNOMED_CODING_SCHEME = "SNOMED"; public static final String CTAKES_RXNORM_CODING_SCHEME = "RXNORM"; Thanks and Best Regards Pratik Agarwal On Tue, Dec 6, 2016 at 8:19 PM, Finan, Sean <Sean.Finan@childrens.harvard. edu> wrote: > Hi Pratik, > > It sounds like you are running using code from trunk. That is good. > > I have attached a document that outlines how you can use a dictionary > creator gui to make a database with any umls source vocabulary that you > need. That would be section 6.1. It also outlines use of a bsv (similar > to csv) file in section 6.2. > > Please provide questions and feedback as I would like to improve this > document before making it public on the ctakes website. > > Some brief information on how the fast dictionary lookup works can be > viewed here: https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3. > 2+-+Fast+Dictionary+Lookup > > Sean > > > > > > -----Original Message----- > From: pratik agarwal [mailto:pratikagarwal2...@gmail.com] > Sent: Monday, December 05, 2016 6:21 AM > To: u...@ctakes.apache.org > Subject: Fwd: Dictionary in cTAKES > > Hi everyone > > I came across cTAKES fairly recently and I'm facing some difficulties with > understanding the working of it. I am required to map clinical text notes > with the ICD-10-CM and CPT/HCPCS codes. From what I read, or tried, the > default dictionaries used with the fast pipeline are SNOMEDCT, RXNORM and > ICD9CM. > > I am currently trying to work with the user version of cTAKES in Intellij > IDEA with Java Oracle JDK 8. > > It would be great if someone could help me out. I am really sorry if this > is too easy a problem, but I've been trying to solve it for a while and I'm > stuck. > > I was able to extract ICD9CM codes from cTAKES with the default resources > i.e. ctakesnorx.properties and ctakesnorx.script > > I wanted to get ICD10CM and ICD10PCS codes, so I downloaded .script and > .properties file from this source: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__sourcef > orge.net_p_ctakesresources_code_HEAD_&d=DgIBaQ&c= > qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpy > IisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvj > slf1QfJGV0SxFK4&s=MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e= > tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/ > <https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_ctakesresources_code_HEAD_&d=DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e=tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/> > src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/ > > and made corresponding changes to the cTakesHsql.xml file as mentioned by > Sean in: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mai > l-2Darchive.com_dev-40ctakes.apache.org_msg02597.html&d= > DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67G > vlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAH > AYAn6pAvjslf1QfJGV0SxFK4&s=X84zAxr2pdhp4sOWuOQ1wfpEGCLtD9s16dB7DaTghc0&e= > > But this doesn't seem to work. I played around a bit with the parameters > in the following lines: > > <property key="snomedct_usTable" value="long"/> > <property key="rxnormTable" value="text"/> > <property key="icd9cmTable" value="text"/> > <property key="icd10pcsTable" value="text"/> > > Basically when I was getting blank outputs after making the change. > I am using OntologyConceptUtil.getSchemeCodes(JCas) for getting the > outputs. > > I was getting an error with rxnormTable. So I commented that line out. and > after that I was getting blank output. So I tried replacing value="text" > with value = "icd9cm" for key = "icd9cmTable" and it started returning > ICD9CM codes. But I couldn't get anything when I did the same with > ICD10PCS. I again got a blank output. > > Note: I did all this after commenting: > > <property key="snomedTable" value="snomedct"/> > <property key="rxnormTable" value="rxnorm"/> > <property key="icd9Table" value="icd9cm"/> > <property key="icd10Table" value="icd10pcs"/> > > > It would be great if someone could help me understand how the dictionary > mechanism is working. Also, how to get ICD10CM codes and ICD10PCS codes > from this. > > (i) What are the keys and values mentioned above and where can I find > these in the script or properties file? Is there a way I can access these? > Please help me understand how this is working. > > (ii) I have a csv file containing the ICD codes with the code in Column 1 > and description in Column 2 and similarly for CPT/HCPCS codes. What are the > steps I need to take to make it work with OntologyConceptUtil.getSchemeC > odes(JCas). > > > I saw from different forums that we can use dictionary-gui tool from > sandbox. But I am not really understanding which files do I need to run in > that folder. Also, where in the project tree should I place this folder to > make it run. Also, what are the parameters required and where do I change > them, if any. > > Thanks a lot. > > Regards, > Pratik Agarwal >