Hi Pratik, How are you running ctakes? If you are running it using the older uima style then editing the descriptor files (*Annotator.xml) as you have done should work. If you are running it with a UimaFit class or a piper file then you will need to redirect to your custom dictionary config .xml in another manner. The pains of progress …
1. Let me know how you launch ctakes. a. If you are launching by directly running a class then you will need to override a default parameter functionally. Edit your call “AnalysisEngineFactory.createEngineDescription( DefaultJCasTermAnnotator.class );” And add “,JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY, [my].xml” to the call right after “.class”. Note the comma. b. If you are running with the DefaultFastPipeline.piper file (used by bin/runClinicalPipeline) then you can edit the piper file and add a line with “addParameters DictionaryDescriptor=[my].xml”. Add it above the line “add DefaultJCasTermAnnotator”. If you updated trunk within the last few days you can use “set” in place of “addParameters”. The DefaultFastPipeline.piper is in resources/org/apache/ctakes/clinical/pipeline/piper/ 2. “key” and “value” indicate the name of a vocabulary table in the dictionary database and the datatype of the code values within that table. It looks like all of your snomed and rxnorms were able to be stored as “long”, but the other vocabularies had at least one character or two decimals so they required “text”. a. All of the table names in the database are those listed as keys but without the “Table” suffix. For instance, yours are “snomedct_us_2016_09_01” , “rxnorm_16aa_160906f” and so forth. b. You don’t need to change any named contants (*CODING_CHEME=*) in the code to fetch your data. If you are getting codes in the CPE then they should be available under OntologyConceptArray. If you are getting codes programmatically then use the class OntologyConceptUtil. It has a tonne of methods that can be used to obtain codes, for the entire document, for certain sections, for individual annotations, etc. I hope that the above is clear. I will try to add all of this to some documentation asap and make it available publicly. Don’t anybody hold your breath though … Sean TODO SPF From: pratik agarwal [mailto:pratikagarwal2...@gmail.com] Sent: Friday, December 16, 2016 3:40 AM To: Finan, Sean Cc: dev@ctakes.apache.org Subject: Re: Dictionary in cTAKES Thanks Sean and Nishant for the help. Sean, the document you sent was really helpful. I was able to successfully create a dictionary using the dictionary-gui. But I'm still not able to use the dictionary. It would be great if you could help me out. I got a .script file, a .properties file, a .rc file and a .xml file on running the dictionary-gui as Sean mentioned here: http://mail-archives.apache.org/mod_mbox/ctakes-dev/201601.mbox/%3CCA+jqmuyBcv-h67bxg=gummpVkE_khOXpSfRvSqx=jk3pzz7...@mail.gmail.com%3E<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201601.mbox_-253CCA-2BjqmuyBcv-2Dh67bxg-3DgummpVkE-5FkhOXpSfRvSqx-3DjK3pzZ7WGA-40mail.gmail.com-253E&d=DgMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=p70NxcbO486VuKQBsHbtqJgTuufOpYLf7I2B_6sJMY0&s=WnxjzqXpPeLEVlgbJ2qea8Z1rdQjih37ci-zwB9rIE4&e=> Then I changed the file url in both UmlsLookupAnnotator.xml and UmlsOverlapLookupAnnotator.xml from cTakesHsql.xml to [new xml file name].xml in the directory [cTAKES root]/desc/ctakes-dictionary-lookup-fast/desc/analysis_engine/: Here's the part where I changed it: <name>DictionaryDescriptorFile</name> <description/> <fileResourceSpecifier> <fileUrl>file:org/apache/ctakes/dictionary/lookup/fast/[new xml file name].xml</fileUrl> </fileResourceSpecifier> <implementationName>org.apache.ctakes.core.resource.FileResourceImpl</implementationName> But when I run the program, in the line describing the dictionary resource used, I see that the cTakesHsql.xml is still being used instead of the new one. Here is what it looks like: INFO DictionaryDescriptorParser - Parsing dictionary specifications: /home/pratik/Desktop/cTAKES/out/production/cTAKES/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml Another issue I'm facing is, even when I simply replace the contents of cTakesHsql.xml with the contents of the new xml file, it's not returning any codes (ICD,RXNORM etc.), although the original cTakesHsql.xml was returning a few codes. I have a feeling this has to do with the keys and values in: <property key="snomedct_us_2016_09_01Table" value="long"/> <property key="rxnorm_16aa_160906fTable" value="long"/> <property key="icd10pcs_2017Table" value="text"/> <property key="icd10cm_2017Table" value="text"/> <property key="icd9cm_2014Table" value="text"/> Can you please guide me on the following 2 questions: 1. Where do I need to change the resource xml file location, to make cTAKES use my custom dictionary instead of the default one. 2. What do the key and value above actually correspond to? Do I need to make any changes to it? I saw a lot of class files that contain terms like "RXNORM", "SNOMEDCT", "ICD9CM" etc. Do I need to make any changes in those files too? For example, in "IdentifiedAnnotation.class", I can see lines like: private static final Logger LOGGER = Logger.getLogger("IdentifiedAnnotationUtil"); public static final String CTAKES_SNOMED_CODING_SCHEME = "SNOMED"; public static final String CTAKES_RXNORM_CODING_SCHEME = "RXNORM"; Thanks and Best Regards Pratik Agarwal On Tue, Dec 6, 2016 at 8:19 PM, Finan, Sean <sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu>> wrote: Hi Pratik, It sounds like you are running using code from trunk. That is good. I have attached a document that outlines how you can use a dictionary creator gui to make a database with any umls source vocabulary that you need. That would be section 6.1. It also outlines use of a bsv (similar to csv) file in section 6.2. Please provide questions and feedback as I would like to improve this document before making it public on the ctakes website. Some brief information on how the fast dictionary lookup works can be viewed here: https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2+-+Fast+Dictionary+Lookup<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLookup&d=DgMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=p70NxcbO486VuKQBsHbtqJgTuufOpYLf7I2B_6sJMY0&s=YLf08s5AAkmYnEHjwcs47pSnYq41MZMfFmuBewtqnzQ&e=> Sean -----Original Message----- From: pratik agarwal [mailto:pratikagarwal2...@gmail.com<mailto:pratikagarwal2...@gmail.com>] Sent: Monday, December 05, 2016 6:21 AM To: u...@ctakes.apache.org<mailto:u...@ctakes.apache.org> Subject: Fwd: Dictionary in cTAKES Hi everyone I came across cTAKES fairly recently and I'm facing some difficulties with understanding the working of it. I am required to map clinical text notes with the ICD-10-CM and CPT/HCPCS codes. From what I read, or tried, the default dictionaries used with the fast pipeline are SNOMEDCT, RXNORM and ICD9CM. I am currently trying to work with the user version of cTAKES in Intellij IDEA with Java Oracle JDK 8. It would be great if someone could help me out. I am really sorry if this is too easy a problem, but I've been trying to solve it for a while and I'm stuck. I was able to extract ICD9CM codes from cTAKES with the default resources i.e. ctakesnorx.properties and ctakesnorx.script I wanted to get ICD10CM and ICD10PCS codes, so I downloaded .script and .properties file from this source: https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_ctakesresources_code_HEAD_&d=DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e= tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/ src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/ and made corresponding changes to the cTakesHsql.xml file as mentioned by Sean in: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_dev-40ctakes.apache.org_msg02597.html&d=DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=X84zAxr2pdhp4sOWuOQ1wfpEGCLtD9s16dB7DaTghc0&e= But this doesn't seem to work. I played around a bit with the parameters in the following lines: <property key="snomedct_usTable" value="long"/> <property key="rxnormTable" value="text"/> <property key="icd9cmTable" value="text"/> <property key="icd10pcsTable" value="text"/> Basically when I was getting blank outputs after making the change. I am using OntologyConceptUtil.getSchemeCodes(JCas) for getting the outputs. I was getting an error with rxnormTable. So I commented that line out. and after that I was getting blank output. So I tried replacing value="text" with value = "icd9cm" for key = "icd9cmTable" and it started returning ICD9CM codes. But I couldn't get anything when I did the same with ICD10PCS. I again got a blank output. Note: I did all this after commenting: <property key="snomedTable" value="snomedct"/> <property key="rxnormTable" value="rxnorm"/> <property key="icd9Table" value="icd9cm"/> <property key="icd10Table" value="icd10pcs"/> It would be great if someone could help me understand how the dictionary mechanism is working. Also, how to get ICD10CM codes and ICD10PCS codes from this. (i) What are the keys and values mentioned above and where can I find these in the script or properties file? Is there a way I can access these? Please help me understand how this is working. (ii) I have a csv file containing the ICD codes with the code in Column 1 and description in Column 2 and similarly for CPT/HCPCS codes. What are the steps I need to take to make it work with OntologyConceptUtil.getSchemeCodes(JCas). I saw from different forums that we can use dictionary-gui tool from sandbox. But I am not really understanding which files do I need to run in that folder. Also, where in the project tree should I place this folder to make it run. Also, what are the parameters required and where do I change them, if any. Thanks a lot. Regards, Pratik Agarwal