Thanks Sean  ... now I'm going to jog your memory:

I quickly went through the dictionary code.  You were right.  There was a
class AutoTermExtractor in org.apache.ctakes.gui.dictionary.umls which
looks like it did what you said.  But all of it is all commented out.

Then there's another bit of code with a function extractAbbreviations() in
UmlsTermUtil, and this one relies on externalized files including this
one:  default/RightAbbreviations.txt.  And this file contains (SOB), one of
the abbreviations I was looking for.

Now this file seems to exist in multiple versions

cogitext:trunk-java8 peterabramowitsch$ find . -name
"RightAbbreviations.txt" -exec wc -l {} \;
    1178
./ctakes-gui-res/target/classes/org/apache/ctakes/gui/dictionary/data/default/RightAbbreviations.txt
       0
./ctakes-gui-res/target/classes/org/apache/ctakes/gui/dictionary/data/small/RightAbbreviations.txt
       8
./ctakes-gui-res/target/classes/org/apache/ctakes/gui/dictionary/data/tim/RightAbbreviations.txt
       0
./ctakes-gui-res/target/classes/org/apache/ctakes/gui/dictionary/data/tiny/RightAbbreviations.txt

Does this jog your memory enough to fill in the history and tell me what I
need to do?

Peter


On Fri, Aug 14, 2020 at 4:53 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Peter,
>
> I don't have an answer but I do have a question:
>
> In your mrconso.rrf, do you see a snomed line item for "SOB" or only "SOB
> -Shortness of breath" ?
>
> I think that the simple "SOB" and "sob" entries might be from other
> vocabularies.
>
> There is (was?) logic in the dictionary creator to multiply things like
> "SOB - Shortness of breath", "SOB (Shortness of breath)"  etc. and create 3
> synonym entries: full, left and right.  There is a requirement that the
> left side be all caps and a fitting acronym for the right side.  However, I
> vacillated on the correctness of this behavior as almost all terms already
> had the 3 entries.  I am not sure what the current version of the creator
> does.
>
> Dictionary creation is indeed a touchy operation.
>
> Sean
> ________________________________________
> From: Peter Abramowitsch <pabramowit...@gmail.com>
> Sent: Thursday, August 13, 2020 11:57 PM
> To: dev@ctakes.apache.org
> Subject: Need a little more help on dictionaries [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi All
>
> I'm able to create a subset with the UMLS mmsys tool, use the dictionary
> creator on the full UMLS release, create, install and tweak the scripts
> adding or removing aliases etc.  My goal is simply to add HUGO gene terms
> to SNOMED and RXNORM.
>
> However I must be missing some bit of information on the use of mmsys or
> the dictionary creator, because some very common terms are missing from my
> dictionary but present in the released sno_rx
>
> As an example, the acronym SOB
> in mmsys, the term SOB is present in my subset, and it is mapped into
> SNOMED with the expected CUI 13404 and SNOMEDIDs same as sno_rx
> I see the cui_tui mapping it into the correct TUI for a finding  INSERT
> INTO TUI VALUES(13404,184)
> I see the cui and the preferred term "dyspnea" in my *script file, and I
> can resolve it in a note using the default consumer and obtaining the
> correct SNOMED ID
> I see lots of cui_term entries for the same CUI, and I can resolve them
> too.  but  SOB is not present in my cui terms.
> How did it get there?
>
> So either - I am not using one of the tools correctly, or in creating
> SNO_RX, someone has added SOB by hand rather than using the creator.  And
> if they have, they have probably also done other tweaks.
>
> Sean, Ghandi or Jeff
> Can you explain this?
>
> Peter
>

Reply via email to