Thanks Sean for the reply. You were right. The cTakesHsql.xml was being used because I was calling the getFastPipeline method, which uses *AnalysisEngineFactory.**createEngineDescription() *where the file resource is by default mentioned to be cTakesHsql.xml. Now, to respond to how am I using cTakes:
1. I am using cTAKES in IntelliJ IDEA. I downloaded the user version and added the compiled binaries to my classpath. The *OntologyConceptUtil.java* was not in there, so I separately downloaded it from trunk and added that too to my classpath. 2. I wrote a simple java code that calls the *getFastPipeline()* method from the *ClinicalPipelineFactory *class. and then use OntologyConceptUtil. getSchemeCodes(*Identified Annotation object) *to get the codes in a document. So, when I was using it with the original cTakesHsql.xml with ctakessnorx.script and ctakessnorx.properties as the sources used, I was getting the codes like : Entity: coenzyme Q10=== codes:* {RXNORM=[21406], SNOMEDCT=[412129003, 412130008]}* I was also getting the ICD-9 codes, like: *wound opening=== codes: {ICD9CM=[870-897.99], SNOMEDCT=[269362000, 59091005, 125643001, 157351009, 157439006, 269347002]}* but when I modified the cTakesHsql.xml as you mentioned here: https://www.mail-archive.com/dev@ctakes.apache.org/msg02597.html Note: I did change value="jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx/ctakessnorx"/> to value="jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/ctakesicd2015"/> I got: *wound opening=== codes:{}* So I tried changing <property key="*icd9cm_2014Table*" value="*text*"/> to <property key="*icd9cm_2014Table*" value="*ic**d9cm*"/> and I got: *wound opening=== codes:{**ICD9CM=[870-897.99]**}.* But no success with ICD10CM or RXNORM. Then I followed the process that was mentioned in the document you sent. I managed to get the .script, .properties, .rc file and the .xml file as expected. But even with that xml file, I'm simply getting empty values printed: *wound opening=== codes:{}* It would be really great if you can help me understand what I might be doing wrong. Thanks and Best Regards. Pratik Agarwal On Fri, Dec 16, 2016 at 7:57 PM, Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Pratik, > > > > How are you running ctakes? If you are running it using the older uima > style then editing the descriptor files (*Annotator.xml) as you have done > should work. If you are running it with a UimaFit class or a piper file > then you will need to redirect to your custom dictionary config .xml in > another manner. The pains of progress … > > > > 1. Let me know how you launch ctakes. > > a. If you are launching by directly running a class then you will > need to override a default parameter functionally. Edit your call > > “AnalysisEngineFactory.createEngineDescription( > DefaultJCasTermAnnotator.class );” > > And add “,JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY, [my].xml” to the > call right after “.class”. Note the comma. > > b. If you are running with the DefaultFastPipeline.piper file (used > by bin/runClinicalPipeline) then you can edit the piper file and add a line > with “addParameters DictionaryDescriptor=[my].xml”. Add it above the > line “add DefaultJCasTermAnnotator”. If you updated trunk within the last > few days you can use “set” in place of “addParameters”. The > DefaultFastPipeline.piper is in resources/org/apache/ctakes/ > clinical/pipeline/piper/ > > 2. “key” and “value” indicate the name of a vocabulary table in the > dictionary database and the datatype of the code values within that table. > It looks like all of your snomed and rxnorms were able to be stored as > “long”, but the other vocabularies had at least one character or two > decimals so they required “text”. > > a. All of the table names in the database are those listed as keys > but without the “Table” suffix. For instance, yours are > “snomedct_us_2016_09_01” , “rxnorm_16aa_160906f” and so forth. > > b. You don’t need to change any named contants (*CODING_CHEME=*) in > the code to fetch your data. > > If you are getting codes in the CPE then they should be available under > OntologyConceptArray. > > If you are getting codes programmatically then use the class > OntologyConceptUtil. It has a tonne of methods that can be used to obtain > codes, for the entire document, for certain sections, for individual > annotations, etc. > > > > I hope that the above is clear. I will try to add all of this to some > documentation asap and make it available publicly. Don’t anybody hold your > breath though … > > > > Sean > > > > > > TODO SPF > > > > *From:* pratik agarwal [mailto:pratikagarwal2...@gmail.com] > *Sent:* Friday, December 16, 2016 3:40 AM > *To:* Finan, Sean > *Cc:* dev@ctakes.apache.org > *Subject:* Re: Dictionary in cTAKES > > > > Thanks Sean and Nishant for the help. Sean, the document you sent was > really helpful. I was able to successfully create a dictionary using the > dictionary-gui. But I'm still not able to use the dictionary. It would be > great if you could help me out. > > > > I got a .script file, a .properties file, a .rc file and a .xml file on > running the dictionary-gui as Sean mentioned here: > > http://mail-archives.apache.org/mod_mbox/ctakes-dev/ > 201601.mbox/%3CCA+jqmuyBcv-h67bxg=gummpVkE_khOXpSfRvSqx= > jk3pzz7...@mail.gmail.com%3E > <https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201601.mbox_-253CCA-2BjqmuyBcv-2Dh67bxg-3DgummpVkE-5FkhOXpSfRvSqx-3DjK3pzZ7WGA-40mail.gmail.com-253E&d=DgMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=p70NxcbO486VuKQBsHbtqJgTuufOpYLf7I2B_6sJMY0&s=WnxjzqXpPeLEVlgbJ2qea8Z1rdQjih37ci-zwB9rIE4&e=> > > > > Then I changed the *file url* in both *UmlsLookupAnnotator.xml* and > *UmlsOverlapLookupAnnotator.xml > *from cTakesHsql.xml to [new xml file name].xml in the directory [cTAKES > root]/desc/ctakes-dictionary-lookup-fast/desc/analysis_engine/: > > > > Here's the part where I changed it: > > *<name>**DictionaryDescriptorFile**</name> > > * > > * <description/> > > * > > * <fileResourceSpecifier> > > * > > * > <fileUrl>**file:org/apache/ctakes/dictionary/lookup/fast/[new > xml file name].xml**</fileUrl> * > > * </fileResourceSpecifier> > > * > > * <implementationName>* > *org.apache.ctakes.core.resource.FileResourceImpl**</implementationName> * > > > > But when I run the program, in the line describing the dictionary resource > used, I see that the cTakesHsql.xml is still being used instead of the new > one. Here is what it looks like: > > > > *INFO DictionaryDescriptorParser - Parsing dictionary specifications: > /home/pratik/Desktop/cTAKES/out/production/cTAKES/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml* > > > > Another issue I'm facing is, even when I simply replace the contents of > cTakesHsql.xml with the contents of the new xml file, it's not returning > any codes (*ICD,RXNORM etc.*), although the original cTakesHsql.xml was > returning a few codes. I have a feeling this has to do with the *keys *and* > values* in: > > > > <property key="*snomedct_us_2016_09_01Table*" value="*long*"/> > > <property key="*rxnorm_16aa_160906fTable*" value="*long*"/> > > <property key="*icd10pcs_2017Table*" value="*text*"/> > > <property key="*icd10cm_2017Table*" value="*text*"/> > > <property key="*icd9cm_2014Table*" value="*text*"/> > > > > Can you please guide me on the following 2 questions: > > > > 1. Where do I need to change the resource xml file location, to make > cTAKES use my custom dictionary instead of the default one. > > > > 2. What do the *key *and *value *above actually correspond to? Do I need > to make any changes to it? I saw a lot of class files that contain terms > like "RXNORM", "SNOMEDCT", "ICD9CM" etc. Do I need to make any changes in > those files too? > > For example, in "IdentifiedAnnotation.class", I can see lines like: > > > > private static final Logger LOGGER = > Logger.getLogger("IdentifiedAnnotationUtil"); > > public static final String CTAKES_SNOMED_CODING_SCHEME = "SNOMED"; > > public static final String CTAKES_RXNORM_CODING_SCHEME = "RXNORM"; > > > > > > Thanks and Best Regards > > Pratik Agarwal > > > > > > On Tue, Dec 6, 2016 at 8:19 PM, Finan, Sean <Sean.Finan@childrens.harvard. > edu> wrote: > > Hi Pratik, > > It sounds like you are running using code from trunk. That is good. > > I have attached a document that outlines how you can use a dictionary > creator gui to make a database with any umls source vocabulary that you > need. That would be section 6.1. It also outlines use of a bsv (similar > to csv) file in section 6.2. > > Please provide questions and feedback as I would like to improve this > document before making it public on the ctakes website. > > Some brief information on how the fast dictionary lookup works can be > viewed here: https://cwiki.apache.org/confluence/display/CTAKES/ > cTAKES+3.2+-+Fast+Dictionary+Lookup > <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLookup&d=DgMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=p70NxcbO486VuKQBsHbtqJgTuufOpYLf7I2B_6sJMY0&s=YLf08s5AAkmYnEHjwcs47pSnYq41MZMfFmuBewtqnzQ&e=> > > Sean > > > > > > -----Original Message----- > From: pratik agarwal [mailto:pratikagarwal2...@gmail.com] > Sent: Monday, December 05, 2016 6:21 AM > To: u...@ctakes.apache.org > Subject: Fwd: Dictionary in cTAKES > > Hi everyone > > I came across cTAKES fairly recently and I'm facing some difficulties with > understanding the working of it. I am required to map clinical text notes > with the ICD-10-CM and CPT/HCPCS codes. From what I read, or tried, the > default dictionaries used with the fast pipeline are SNOMEDCT, RXNORM and > ICD9CM. > > I am currently trying to work with the user version of cTAKES in Intellij > IDEA with Java Oracle JDK 8. > > It would be great if someone could help me out. I am really sorry if this > is too easy a problem, but I've been trying to solve it for a while and I'm > stuck. > > I was able to extract ICD9CM codes from cTAKES with the default resources > i.e. ctakesnorx.properties and ctakesnorx.script > > I wanted to get ICD10CM and ICD10PCS codes, so I downloaded .script and > .properties file from this source: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_ > ctakesresources_code_HEAD_&d=DgIBaQ&c=qS4goWBT7poplM69zy_ > 3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m= > bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s= > MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e= > tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/ > <https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_ctakesresources_code_HEAD_&d=DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e=tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/> > src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/ > > and made corresponding changes to the cTakesHsql.xml file as mentioned by > Sean in: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www. > mail-2Darchive.com_dev-40ctakes.apache.org_msg02597.html&d=DgIBaQ&c= > qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r= > fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m= > bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s= > X84zAxr2pdhp4sOWuOQ1wfpEGCLtD9s16dB7DaTghc0&e= > > > But this doesn't seem to work. I played around a bit with the parameters > in the following lines: > > <property key="snomedct_usTable" value="long"/> > <property key="rxnormTable" value="text"/> > <property key="icd9cmTable" value="text"/> > <property key="icd10pcsTable" value="text"/> > > Basically when I was getting blank outputs after making the change. > I am using OntologyConceptUtil.getSchemeCodes(JCas) for getting the > outputs. > > I was getting an error with rxnormTable. So I commented that line out. and > after that I was getting blank output. So I tried replacing value="text" > with value = "icd9cm" for key = "icd9cmTable" and it started returning > ICD9CM codes. But I couldn't get anything when I did the same with > ICD10PCS. I again got a blank output. > > Note: I did all this after commenting: > > <property key="snomedTable" value="snomedct"/> > <property key="rxnormTable" value="rxnorm"/> > <property key="icd9Table" value="icd9cm"/> > <property key="icd10Table" value="icd10pcs"/> > > > It would be great if someone could help me understand how the dictionary > mechanism is working. Also, how to get ICD10CM codes and ICD10PCS codes > from this. > > (i) What are the keys and values mentioned above and where can I find > these in the script or properties file? Is there a way I can access these? > Please help me understand how this is working. > > (ii) I have a csv file containing the ICD codes with the code in Column 1 > and description in Column 2 and similarly for CPT/HCPCS codes. What are the > steps I need to take to make it work with OntologyConceptUtil. > getSchemeCodes(JCas). > > > I saw from different forums that we can use dictionary-gui tool from > sandbox. But I am not really understanding which files do I need to run in > that folder. Also, where in the project tree should I place this folder to > make it run. Also, what are the parameters required and where do I change > them, if any. > > Thanks a lot. > > Regards, > Pratik Agarwal > > >