>Finally an explanation that makes sense. -- It frequently takes a while to get one of those out of me ...
> I don't have check-in privileges so will keep it private for now. -- We shall have to do something about that. Cheers, Sean ________________________________________ From: Peter Abramowitsch <pabramowit...@gmail.com> Sent: Friday, August 14, 2020 1:17 PM To: dev@ctakes.apache.org Subject: Re: Need a little more help on dictionaries [EXTERNAL] * External Email - Caution * Hurray! Finally an explanation that makes sense. I just couldn't figure out how you could have made sno_rx with that dictionary creator. Clearly, those helper files represent a LOT of work. I have locally modified the dictionary creator code to look for the system property ctakes.dictgui_helperdata as a way to point it to another of those directories. I don't have check-in privileges so will keep it private for now. Many thanks for your help. Peter On Fri, Aug 14, 2020 at 9:51 AM Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Peter, > > shining a flashlight back into the dark ages ... > > You have found the advanced configuration directories! > > Those actually precede the gui dictionary creator and were a big part of > formatting with the previous cli dictionary creator. The cli was versatile > but not simple. The default collection of configuration files for the cli > had a lot more going on. > > I think that I made "tiny/" directory the default for the gui because it > didn't do as much manipulation and I wanted things to be a greater 1:1 > match with the source. > > I obviously used something other than the simple "tiny/" configuration > when I made sno_rx_16ab. I remember running repeated tests on some > corpora as well as manually inspecting the produced databases. > > I can't believe that I had forgotten all of this. > > You should be able to mix and match files from the different configuration > directories and just throw them into your own directory (or tiny/) then > point DEFAULT_.. to your directory and recompile. > > > Sean > > ________________________________________ > From: Peter Abramowitsch <pabramowit...@gmail.com> > Sent: Friday, August 14, 2020 12:22 PM > To: dev@ctakes.apache.org > Subject: Re: Need a little more help on dictionaries [EXTERNAL] > > * External Email - Caution * > > > Hi Sean > > I think I found the answer, and I have one question. > > In dictionary creator, the hardwired dir is "tiny" that in fact has an > empty file for those abbreviations > > In DictionaryBuilder.java: > > *static private final String DEFAULT_DATA_DIR = > "org/apache/ctakes/gui/dictionary/data/tiny";* > *...* > *final UmlsTermUtil umlsTermUtil = new UmlsTermUtil( DEFAULT_DATA_DIR );* > > The command line args are not used in this application, neither are > sysprops or environment vars so there's no way to change it short of > recompiling. > > So the question is: do you know why the empty version is the default? > > Peter > > > > On Fri, Aug 14, 2020 at 4:53 AM Finan, Sean < > sean.fi...@childrens.harvard.edu> wrote: > > > Hi Peter, > > > > I don't have an answer but I do have a question: > > > > In your mrconso.rrf, do you see a snomed line item for "SOB" or only "SOB > > -Shortness of breath" ? > > > > I think that the simple "SOB" and "sob" entries might be from other > > vocabularies. > > > > There is (was?) logic in the dictionary creator to multiply things like > > "SOB - Shortness of breath", "SOB (Shortness of breath)" etc. and > create 3 > > synonym entries: full, left and right. There is a requirement that the > > left side be all caps and a fitting acronym for the right side. > However, I > > vacillated on the correctness of this behavior as almost all terms > already > > had the 3 entries. I am not sure what the current version of the creator > > does. > > > > Dictionary creation is indeed a touchy operation. > > > > Sean > > ________________________________________ > > From: Peter Abramowitsch <pabramowit...@gmail.com> > > Sent: Thursday, August 13, 2020 11:57 PM > > To: dev@ctakes.apache.org > > Subject: Need a little more help on dictionaries [EXTERNAL] > > > > * External Email - Caution * > > > > > > Hi All > > > > I'm able to create a subset with the UMLS mmsys tool, use the dictionary > > creator on the full UMLS release, create, install and tweak the scripts > > adding or removing aliases etc. My goal is simply to add HUGO gene terms > > to SNOMED and RXNORM. > > > > However I must be missing some bit of information on the use of mmsys or > > the dictionary creator, because some very common terms are missing from > my > > dictionary but present in the released sno_rx > > > > As an example, the acronym SOB > > in mmsys, the term SOB is present in my subset, and it is mapped into > > SNOMED with the expected CUI 13404 and SNOMEDIDs same as sno_rx > > I see the cui_tui mapping it into the correct TUI for a finding INSERT > > INTO TUI VALUES(13404,184) > > I see the cui and the preferred term "dyspnea" in my *script file, and I > > can resolve it in a note using the default consumer and obtaining the > > correct SNOMED ID > > I see lots of cui_term entries for the same CUI, and I can resolve them > > too. but SOB is not present in my cui terms. > > How did it get there? > > > > So either - I am not using one of the tools correctly, or in creating > > SNO_RX, someone has added SOB by hand rather than using the creator. And > > if they have, they have probably also done other tweaks. > > > > Sean, Ghandi or Jeff > > Can you explain this? > > > > Peter > > >