Hi Chris, Just off-the-cuff have you tried just using the relative path "org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.bsv" ?
Relative paths within the $CLASSPATH should work in trunk, but perhaps not until the next release? I haven't tested recently (should add a junit ...). Sean -----Original Message----- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Wednesday, October 07, 2015 11:34 PM To: dev@ctakes.apache.org Subject: Re: How to update cTAKES so that new top level categories come out based on local dictionary? Hi Sean, One more question too: So, I put the bsv files in the resources directory as part of my Apache cTAKES 3.2.2 distribution: /usr/local/apache-ctakes-3.2.2-bin/resources underneath: org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.bsv and I referenced it like this (as an example just including the dictionary def, path is same for the concept factory): <dictionary> <name>CustomCuiRareWord</name> <implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.BsvRare WordDictionary</implementationName> <properties> <property key="bsvPath" value="resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv/file. bsv"/> </properties> </dictionary> Here’s what I see in the logs: <snip> 7 Oct 2015 20:31:01 INFO AbstractJCasTermAnnotator - Exclusion tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN VBP VBZ WDT WP WPS WRB 07 Oct 2015 20:31:01 INFO AbstractJCasTermAnnotator - Using minimum term text span: 3 07 Oct 2015 20:31:01 INFO DictionaryDescriptorParser - Parsing dictionary specifications: /data/hosts/web-dev.aws-redda.celgene.com/local/cdeploy/shangridocs/shangri docs-tika/ctakes/apache-ctakes-3.2.2/resources/org/apache/ctakes/dictionary /lookup/fast/cTakesHsql.xml 07 Oct 2015 20:31:01 INFO UmlsUserApprover - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=7c3axagf70xUhorOIr0klz3RYoejn3F4syQ1EdsLJJs&s=ykk9YhvbJfoa2ZEurQdQFSs6E-ta4ecG4vnGauVMqk0&e= for user chrismattmann: .. 07 Oct 2015 20:31:02 INFO UmlsUserApprover - UMLS Account at https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=7c3axagf70xUhorOIr0klz3RYoejn3F4syQ1EdsLJJs&s=ykk9YhvbJfoa2ZEurQdQFSs6E-ta4ecG4vnGauVMqk0&e= for user chrismattmann has been validated 07 Oct 2015 20:31:02 INFO JdbcConnectionFactory - Connecting to jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/ctakess norx/ctakessnorx: ...... 07 Oct 2015 20:31:04 INFO JdbcConnectionFactory - Database connected 07 Oct 2015 20:31:04 ERROR BsvRareWordDictionary - resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.bsv (No such file or directory) 07 Oct 2015 20:31:04 ERROR BsvConceptFactory - resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.bsv (No such file or directory) </snip> I’ve tried all variants, e.g., in the cTakesHsql.xml file I see resources as a prefix for the hsqldb file, so I tried that too, and it doesn’t work. I’ve also tried it without resources as a prefix, that doesn’t work too. Any ideas? Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Emattmann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=7c3axagf70xUhorOIr0klz3RYoejn3F4syQ1EdsLJJs&s=nkdG8JycZip8J53zImoivYI6LCntPkf3zGiuUuSTlfo&e= ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: "Finan, Sean" <sean.fi...@childrens.harvard.edu> Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> Date: Tuesday, October 6, 2015 at 2:04 PM To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> Subject: RE: How to update cTAKES so that new top level categories come out based on local dictionary? >Hi Chris, > >I use bsv to denote "bar separated value" - also known as "pipe >delimited". I typically name the files with a ".bsv" extension, and they >are just plain old boring ascii flat files. >There should be multiple columns in the bsv file separated by the '|' >character. The following are all valid per-line formats: >CUI|text >CUI|TUI|text >CUI|TUI|text|preferredText >It doesn't matter which format you choose, the parser will auto-detect >per-line. Starting a line with "//" or "#" indicates that it is a >comment and should be ignored. > > >To add the bsv dictionary to your pipeline you just need to edit the >resources/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml file >and add a couple new sections. >Within the <dictionaries> section, add: > <dictionary> > <name>CustomCuiRareWord</name> > ><implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.BsvRar >eWordDictionary</implementationName> > <properties> > <property key="bsvPath" >value="org/apache/ctakes/dictionary/fast/example/custom_cui_tui_bsv.bsv"/> > </properties> > </dictionary> >Within the <conceptFactories> section, add: > <conceptFactory> > <name>CustomCuiConcept</name> > ><implementationName>org.apache.ctakes.dictionary.lookup2.concept.BsvConcep >tFactory</implementationName> > <properties> > <property key="bsvPath" >value="org/apache/ctakes/dictionary/fast/example/custom_cui_tui_bsv.bsv"/> > </properties> > </conceptFactory> >Within the <dictionaryConceptPairs> section, add: > <dictionaryConceptPair> > <name>CustomPair</name> > <dictionaryName>CustomCuiRareWord</dictionaryName> > <conceptFactoryName>CustomCuiConcept</conceptFactoryName> > </dictionaryConceptPair> >You can change all of the [Custom**] names, and you should obviously >point to the actual path of your bsv file. > >In addition to detecting your column count/style, upon loading the text >will be lower-cased and tokenized and the terms will be indexed by rare >word (for fast lookup). Also, you do not need to write out the whole >"C1234567" or "T123" cui tui codes. The default prefix characters and >padding zeros are automatically added. Cuis "1" "01" "C1" "C01" will >all be stored as "C0000001" and Tuis are handled likewise. If you have >custom cuis then it will honor non-"C" prefixes and still pad zeros >automatically based upon the longest entry. For instance, if your bsv >has "CAM1", "CAM12" and "CAM12345" then the stored custom cuis should be >"CAM00001", "CAM00012" and "CAM13245". > >I think that is about all that there is to it ... > >Sean > >-----Original Message----- >From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] >Sent: Tuesday, October 06, 2015 4:31 PM >To: dev@ctakes.apache.org >Subject: Re: How to update cTAKES so that new top level categories come >out based on local dictionary? > >Hi Sean, > > > >Thanks so much for your reply. For now I don’t care about the secondary > >codes and I for sure have < 1000 terms. Can you tell me how to wire up > >the BSV file by editing specific places in cTAKES? What specific commands > >should I run or what format should the BSV file look like? I must admit > >I have never heard of BSV files and the Internet varies on these between > >Bluespec System Verilog and BASIC bsave files. > > > >Then after I make the BSV file, what steps next? Recompile cTAKES? Can > >I take the BSV file and simply point to it from a binary installation of > >cTAKES? Thank you! > > > >Cheers, > >Chris > > > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >Chris Mattmann, Ph.D. > >Chief Architect > >Instrument Software and Science Data Systems Section (398) > >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >Office: 168-519, Mailstop: 168-527 > >Email: chris.a.mattm...@nasa.gov > >WWW: >https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Ematt >mann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZst >TpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bLdoNVceobXShsqfGFdPDKSiq2WNSUbGDHdvmrf >Mj10&s=CXhGiFUuPnSekOe4GnsuxPOgYHbNp-hAnOD8jmB-lgc&e= > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >Adjunct Associate Professor, Computer Science Department > >University of Southern California, Los Angeles, CA 90089 USA > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > >-----Original Message----- > >From: "Finan, Sean" <sean.fi...@childrens.harvard.edu> > >Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> > >Date: Tuesday, October 6, 2015 at 8:05 AM > >To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> > >Subject: RE: How to update cTAKES so that new top level categories come > >out based on local dictionary? > > > >>Hi Chris, > >> > >>There are a few ways to do this: > >>1. Create an additional dictionary with the terms of interest and add it > >>as a source > >>2. Create a new dictionary hsqldb that contains everything, old and new > >>3. Add to the existing hsqldb dictionary > >> > >>The best approach for you would probably depend upon > >>1. How many new terms you have > >>2. Whether or not you desire additional codes, i.e. rxnorm, snomed > >> > >>If you don't have many new terms (<1000) and you don't care about > >>secondary codes then the easiest thing would be to create a BSV file with > >>the new terms and cuis. > >> > >>If you have a lot of new terms or do care about secondary codes, then a > >>less facile solution would be to create a new hsqldb with only the new > >>info or a complete replacement with new and old/existing terms. Of the > >>two hsql options creating a new all-inclusive database would probably be > >>easier unless you want to learn the ins and outs of hsql. If all of the > >>terms are in the umls, then the new all-inclusive hsqldb would definitely > >>be easiest (I think) as you could use the dictionary tool to create it. > >> > >>If you let me know your exact situation then I may be able to better > >>expound. > >> > >>Sean > >> > >>-----Original Message----- > >>From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] > >>Sent: Monday, October 05, 2015 7:36 PM > >>To: dev@ctakes.apache.org > >>Subject: How to update cTAKES so that new top level categories come out > >>based on local dictionary? > >> > >>Hi cTAKES team, > >> > >> > >> > >>Hope you’re well! I had a quick question. I was wondering if someone > >> > >>could provide me a step-by-step guide to updating cTAKES to be based > >> > >>off a local dictionary, so that in addition to e.g., > >> > >> > >> > >>ProceduralMention > >> > >> Value1 position etc > >> > >> Value2 position etc > >> > >> > >> > >>MedicationMention > >> > >> Value1 position etc > >> > >> Value2 position etc > >> > >> > >> > >> > >> > >>NewTopLevelCategoryFromMyDictionary > >> > >> FoundValue1 position etc > >> > >> FoundValue2 position etc > >> > >> > >> > >> > >> > >>I realize this has something to do with updating the annotation > >> > >>descriptions etc in XML, so if I someone just could tell me what > >> > >>to update I’d really appreciate it. > >> > >> > >> > >>Thank you! > >> > >> > >> > >>Cheers, > >> > >>Chris > >> > >> > >> > >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > >>Chris Mattmann, Ph.D. > >> > >>Chief Architect > >> > >>Instrument Software and Science Data Systems Section (398) > >> > >>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> > >>Office: 168-519, Mailstop: 168-527 > >> > >>Email: chris.a.mattm...@nasa.gov > >> > >>WWW: > >>https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Emat >>t > >>mann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZs >>t > >>TpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=MEZE0aOE5pBHul1QA3A9xWbiwS6LzZaIq2rMw9 >>a > >>jiB0&s=cvi79MY1__guvBRsQmsYfc39lqPvv-1Yx1Pg8g5B0QU&e= > >> > >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > >>Adjunct Associate Professor, Computer Science Department > >> > >>University of Southern California, Los Angeles, CA 90089 USA > >> > >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > >> > >> > >> > >> > >> > >> > > >