Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Miller, Timothy Tue, 15 Sep 2020 12:55:10 -0700

Peter,
The parts of speech come from the ctakes-pos-tagger module, which uses
the OpenNLP pos tagger trained on clinical data. There is a
constituency parser as well, which I think in theory can tag even
better (that might be able to get you a unary branch in a tree from NN
-> CD -> <number>.), but is a lot slower than the pos tagger and we
probably don't want to make it necessary to run for simple dictionary
pipelines. 
Tim


On Tue, 2020-09-15 at 12:12 -0700, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> 
> Sean this conversation raises for me a question that I've had for a
> while.
>  Does the term finding mechanism actually use a treebank to find the
> POS or
> does it use a another less rigorous approach.   If it were rigorous
> wouldn't it be able to tag a pure number as an NN in the role
> of  object if
> it played the corresponding role in the sentence?
> 
> I've not had the same problem as Ayyub,  but I have been wondering
> why one
> needed to disable the identification of cm as a genetic acronym
> because of
> situations where clearly cm is part of a unit of measure and would
> show up
> as an entity's modifier in a treebank.
> 
> Does the question make sense?
> 
> Peter
> 
> On Tue, Sep 15, 2020, 9:02 AM Finan, Sean <
> sean.fi...@childrens.harvard.edu>
> wrote:
> 
> > I should mention that going the Paragraph route would only impact
> > term
> > lookup.
> > ________________________________________
> > From: abad.ay...@cognizant.com <abad.ay...@cognizant.com>
> > Sent: Tuesday, September 15, 2020 11:54 AM
> > To: dev@ctakes.apache.org
> > Subject: RE: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES [EXTERNAL]
> > 
> > * External Email - Caution *
> > 
> > 
> > Thank you Sean for the response. We shall definitely try that way.
> > I have
> > one question on the "f84.1" problem, since we have now developed a
> > lot of
> > features based on the output from cTAKES, is the impact of changing
> > the
> > sentenceDetectorAnnotator going to be huge?
> > 
> > Thanks & Regards
> > 
> > Abad Ayyub
> > Vnet: 406170 | Cell : +91-9447379028
> > 
> > 
> > 
> > -----Original Message-----
> > From: Finan, Sean <sean.fi...@childrens.harvard.edu>
> > Sent: Tuesday, September 15, 2020 9:06 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES [EXTERNAL]
> > 
> > [External]
> > 
> > 
> > Hi Abad,
> > 
> > The first thing that I would try for the "97112" problem is
> > changing the
> > parts of speech that are ignored for lookup.  Right now a pure
> > number is
> > ignored - it is not a word.  So, similar to what I said in my
> > previous
> > email, change the dictionary lookup parameter exclusionTags.  But
> > to make
> > sure that you get everything, you can first try no exclusions:
> > set exclusionTags=""
> > 
> > My guess with the F84.1 problem is that your sentence splitter is
> > splitting "F84.1" but not splitting "F84 . 1".
> > 
> > I think that the best way to start debugging is adding the
> > PrettyTextWriter to the end of the piper and looking at its output
> > (see my
> > previous email).   It will print each sentence on a line and
> > indicate the
> > part of speech for each token.  If you can quickly and easily see
> > what the
> > system is doing then you might start to understand what needs to be
> > changed
> > to fit your data.
> > 
> > Sean
> > ________________________________________
> > From: abad.ay...@cognizant.com <abad.ay...@cognizant.com>
> > Sent: Tuesday, September 15, 2020 11:15 AM
> > To: dev@ctakes.apache.org
> > Subject: RE: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES [EXTERNAL]
> > 
> > * External Email - Caution *
> > 
> > 
> > Thank you Sean for the detailed response.  I think there was
> > miscommunication from our end with the requirement. Your solution
> > of adding
> > spaces between the entries worked but it required the input  text
> > also to
> > have the spaces. If the text comes in as 'F84.1' cTAKES didn't
> > reckon the
> > token but if the text came as 'F84 . 1' then cTAKES was recognizing
> > the
> > tokens for the below INSERT scripts.
> > 
> > INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)
> > 
> > But we encountered a similar issue when we configured an INSERT
> > entry as
> > below for CPT codes,
> > 
> > INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)
> > 
> > Where 97112 is a CPT code(which usually doesn’t have decimals or
> > '.'). We
> > expected cTAKES to recognize the CPT code '97112' as a separate
> > token but
> > it didn't. Could you pls. advise us on why this issue came up.
> > 
> > Is there something wrong in the configuration. Do we need to have
> > something additional for cTAKES to recognize the code alone as a
> > separate
> > token Is there any other way in which we can try to get the
> > respective
> > ICD/CPT code of the identified annotation from cTAKES, like
> > querying the
> > CPT/ICD table using the fetched CUI? Kindly advise.
> > 
> > 
> > Thanks & Regards
> > 
> > Abad Ayyub
> > Vnet: 406170 | Cell : +91-9447379028
> > 
> > 
> > 
> > -----Original Message-----
> > From: Finan, Sean <sean.fi...@childrens.harvard.edu>
> > Sent: Monday, September 14, 2020 9:35 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES [EXTERNAL]
> > 
> > [External]
> > 
> > 
> > Hi Abad,
> > 
> > 
> > I think that you need to make only one minor change.
> > 
> > 
> > ctakes uses "tokens" for identification and not the actual text.
> > Tokenization turns text such as "F84.1" into "F84 . 1"  The first
> > token
> > being F84, followed by a token encompassing '.' and another with
> > '1'.  The
> > manner in which this is indicated in the .script file is by adding
> > a space
> > between each token.  This makes the full entry:
> > 
> > 
> > INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)
> > 
> > 
> > Notice that the token length is now 3 and the full text contains
> > the
> > between-token spaces.  This would carry forward for the other
> > entries, such
> > as:
> > 
> > 
> > INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)
> > 
> > 
> > Sean
> > 
> > 
> > ________________________________
> > From: abad.ay...@cognizant.com <abad.ay...@cognizant.com>
> > Sent: Monday, September 14, 2020 11:32 AM
> > To: dev@ctakes.apache.org
> > Subject: RE: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES [EXTERNAL]
> > 
> > * External Email - Caution *
> > 
> > 
> > Hi Team,
> > 
> > I hope you all are doing good. With your support ,We were able to
> > successfully add our required synonyms into existing dictionary and
> > could
> > see that it was getting successfully picked up by cTAKES. Now we
> > have a
> > requirement to configure the ICD and CPT also, where we followed
> > the steps
> > as mentioned in cTAKES wiki and generated the respective .script
> > file.
> > 
> > The newly created dictionary which comprises of
> > SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as
> > expected
> > but we have a requirement to extract the ICD code for the
> > respective
> > description . so the scenario would be like for a text like below
> > 
> > ‘F84.1 pervasive developmental disorders’
> > 
> > We would need cTAKES to reckon F84.1 as a token or at least as an
> > attribute in any of the ‘IdentifiedAnnotation’. So for achieving
> > the same
> > based on our prior experience we tried to tweak the dictionary
> > where we
> > added a synonym for the existing CUI as below
> > 
> > INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive
> > developmental
> > disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2,
> > ‘F84.1
> > pdd’, ‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1,
> > ‘F84.1’,’F84.1’)
> > 
> > Though we have seen cTAKES can identify ‘F84’ alone as a token but
> > it
> > won’t consider whenever a ‘.’ Has been encountered. As an end
> > result cTAKES
> > won’t be able to give the ICD codes like F84.1,M25.6 as separate
> > tokens.
> > Since almost all of the ICD codes have  a ‘.’ Associated with it,
> > this way
> > of tweaking the dictionary is not working. Infact cTAKES is
> > recognizing the
> > digit after decimal within the ‘FractionAnnotation’
> > 
> > Does cTAKES have the capability to return the code like ICD code
> > while
> > retrieving  the token as an individual token or as an attribute in
> > any of
> > the tokens
> > 
> > Is there any other way in which the dictionary can be tweaked , so
> > that a
> > synonym addition as below will recognize the ICD code as a token
> > and will
> > be returned from cTAKES
> > 
> > INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)
> > 
> > 
> > Kindly check and advise us on how to proceed on this situation
> > 
> > Thanks & Regards
> > [cid:D3145E69-CD94-48C1-877F-5134EEAFB598]
> > 
> > Abad Ayyub
> > Vnet: 406170 | Cell : +91-9447379028
> > 
> > 
> > 
> > From: Remy Sanouillet <re...@foreseemed.com>
> > Sent: Tuesday, June 2, 2020 7:23 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES
> > 
> > [External]
> > Hi Abad,
> > 
> > •       How can we point cTAKES application to multiple
> > dictionaries.
> > Currently only sno_rx_16ab is pointed to the application, how can I
> > tweak
> > it to point that to multiple dictionary simultaneously. Or you
> > meant to say
> > create a fresh dictionary with all the vocabularies and point just
> > that in
> > cTAKES.
> > 
> > If you go back in the archive a bit, you should find a thread where
> > I went
> > into detail on how to add multiple dictionaries. Combining all
> > dictionaries
> > into a fresh dictionary is not recommended for obvious reasons. If
> > you
> > can't find the thread, I will dig it up.
> > 
> > •       So for these edits I will have to add INSERT queries to
> > respective
> > tables in the sno_rx_16ab.script file right? Do I need to make any
> > more
> > changes for these tokens to get reflected in cTAKES.
> > 
> > Nope! That is all that is needed and next time you launch cTakes,
> > it
> > should recognize your new entries.
> > 
> > •       If it is a non-existing CUI , I can get the respective
> > CUI,TUI
> > from here
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711644893&amp;sdata=hXu2kXG4Xt%2Bw2kh61fAPVD0FRW25XcZWhcRAJtIGkf0%3D&amp;reserved=0
> > <
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3DAks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=vBFcrxWI0hFUqB%2B1s0F%2FWqPN%2F%2BNFTXm4pCaJB16qCfI%3D&amp;reserved=0>
> > right?
> > 
> > Correct! Remember that the ontology has multiple-inheritance so you
> > need
> > to grab all the TUIs for a given CUI.
> > 
> > •       Based on the source I will have to add entry to respective
> > table
> > right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either
> > one of it
> > and not in all. Correct me if am wrong on this understanding
> > 
> > That is also correct. And most of the time, the dictionaries only
> > contain
> > one CODE table so it is not even a question. However, sno_rx_16ab
> > is an
> > exception with both a CODE table for SNOMEDCT_US and RXNORM. They
> > mostly do
> > not overlap. I do remember that there were a couple of exceptions
> > but, in
> > the case where that happens, the metathesaurus will show it.
> > For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes
> > (372815001 and 68088000) *and* an RXNORM of 149.
> > 
> > •       PREFTERM table will be having only one entry for each CUI
> > right?
> > Basically it’s a one-to-one mapping between CUI and PREFTERM .
> > Correct me
> > if am wrong on this understanding.
> > 
> > You are correct here also. It is a one-to-one mapping although the
> > system
> > appears to tolerate when the PREFTERM is missing.
> > 
> > Rémy Sanouillet
> > NLP Engineer
> > re...@foreseemed.com<mailto:xx...@foreseemed.com>
> > 
> > 
> > [image.png]
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > ForeSee Medical, Inc.
> > 12555 High Bluff Drive, Suite 100
> > San Diego, CA 92130
> > 
> > NOTICE: This e-mail message and all attachments transmitted with it
> > are
> > intended solely for the use of the addressee and may contain
> > legally
> > privileged and confidential information. If the reader of this
> > message is
> > not the intended recipient, or an employee or agent responsible for
> > delivering this message to the intended recipient, you are hereby
> > notified
> > that any dissemination, distribution, copying, or other use of this
> > message
> > or its attachments is strictly prohibited. If you have received
> > this
> > message in error, please notify the sender immediately by replying
> > to this
> > message and please delete it from your computer.
> > 
> > 
> > On Mon, Jun 1, 2020 at 7:56 AM <abad.ay...@cognizant.com<mailto:
> > abad.ay...@cognizant.com>> wrote:
> > Thank you Remy and Peter for your responses. I hope you guys are
> > doing
> > good and safe in this lock down period. Could you pls. help me on
> > my below
> > queries in creating an additional dictionary.
> > 
> > 
> > •       How to create additional dictionary. You meant to say using
> > the
> > UMLS tool , so that using that tool we create .script files from
> > .RRF files?
> > 
> > •       How can we point cTAKES application to multiple
> > dictionaries.
> > Currently only sno_rx_16ab is pointed to the application, how can I
> > tweak
> > it to point that to multiple dictionary simultaneously. Or you
> > meant to say
> > create a fresh dictionary with all the vocabularies and point just
> > that in
> > cTAKES.
> > 
> > I hope Remy was explaining editing the existing dictionary where I
> > would
> > deal with two scenarios where one was with existing CUI and other
> > was with
> > Non-existing CUI. Could you pls. resolve the below queries
> > regarding the
> > same.
> > 
> > 
> > •       So for these edits I will have to add INSERT queries to
> > respective
> > tables in the sno_rx_16ab.script file right? Do I need to make any
> > more
> > changes for these tokens to get reflected in cTAKES.
> > 
> > •       If it is a non-existing CUI , I can get the respective
> > CUI,TUI
> > from here
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=5ffFqKOHKUDW8hrOw2%2Ftbg%2FumJa%2FbE%2B7oB84PMgUAbo%3D&amp;reserved=0
> > <
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=UR%2F623xDp4qXTS7p%2BRxux0I0CN4w0rtyd4a13RxIMuU%3D&amp;reserved=0>
> > right?
> > 
> > •       Based on the source I will have to add entry to respective
> > table
> > right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either
> > one of it
> > and not in all. Correct me if am wrong on this understanding
> > 
> > •       PREFTERM table will be having only one entry for each CUI
> > right?
> > Basically it’s a one-to-one mapping between CUI and PREFTERM .
> > Correct me
> > if am wrong on this understanding.
> > 
> > 
> > Thanks & Regards
> > 
> > Abad Ayyub
> > Vnet: 406170 | Cell : +91-9447379028
> > 
> > 
> > 
> > From: Remy Sanouillet <re...@foreseemed.com<mailto:
> > re...@foreseemed.com>>
> > Sent: Friday, May 29, 2020 9:25 PM
> > To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> > Cc: u...@ctakes.apache.org<mailto:u...@ctakes.apache.org>
> > Subject: Re: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES
> > 
> > [External]
> > Hello Abad,
> > 
> > The short answer is, yes, the sno_rx_16ab can be "hacked". A couple
> > of
> > caveats are that any mistake can stop all recognition and you will
> > lose all
> > your mods on updates. So an additional dictionary is a recommended
> > approach.
> > 
> > There are two cases. EIther the CUI you are adding already exists
> > and you
> > are just adding a synonym. In that case, you only need to add one
> > line:
> > INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
> > where:
> > 
> >   *   CUI is the cui, nuf'said
> >   *   TEXT is the tokenized lowercase string for the entry. In your
> > case
> > 'pap smear'. Most punctuation is a separate token. Single quotes
> > are
> > escaped by doubling them
> >   *   RWORD is the one token in TEXT that is the most indicative
> > (least
> > common) which will be used as the index in the lookup. In your case
> > probably 'pap' since it is not as common as 'smear'
> >   *   RINDEX is the index of RWORD in TEXT. First token is 0 which
> > is the
> > case for 'pap'
> >   *   TCOUNT is the token count for TEXT. In your case, 2
> > So you would want to add:
> > INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')
> > 
> >  If the entry is a non-existing one, you will need to add a few
> > more
> > lines. Their positions are unimportant as long as they are below
> > the header
> > lines (below the final "SET SCHEMA PUBLIC" line).
> > 
> >   1.  INSERT INTO TUI VALUES(CUI,TUI)
> > One line for each TUI in the taxonomy
> >   2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) assuming you are
> > adding a
> > SNOMED
> >   3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM) where PREFTERM is
> > the
> > pretty string to describe the entry. It need not correspond to any
> > indexed
> > entry. It is used for display once the lookup has been successful.
> > That's it. Use at your own discretion. No guarantees.
> > 
> > 
> > Rémy Sanouillet
> > NLP Engineer
> > re...@foreseemed.com<mailto:xx...@foreseemed.com>
> > 
> > 
> > 
> > ForeSee Medical, Inc.
> > 12555 High Bluff Drive, Suite 100
> > San Diego, CA 92130
> > 
> > NOTICE: This e-mail message and all attachments transmitted with it
> > are
> > intended solely for the use of the addressee and may contain
> > legally
> > privileged and confidential information. If the reader of this
> > message is
> > not the intended recipient, or an employee or agent responsible for
> > delivering this message to the intended recipient, you are hereby
> > notified
> > that any dissemination, distribution, copying, or other use of this
> > message
> > or its attachments is strictly prohibited. If you have received
> > this
> > message in error, please notify the sender immediately by replying
> > to this
> > message and please delete it from your computer.
> > 
> > 
> > On Fri, May 29, 2020 at 7:34 AM <abad.ay...@cognizant.com<mailto:
> > abad.ay...@cognizant.com>> wrote:
> > Hi Team,
> > 
> > We set up cTAKES4.0.0 as our NLP engine for our profile recently .
> > We have
> > faced situations where some of the expected tokens are not picked
> > up by
> > cTAKES during clinical text extraction. So our first thought
> > process was to
> > identify where the dictionary is configured and how that can be
> > updated.
> > After some code analysis  it was found that the dictionary is
> > configured in
> > the  below path under ctakes/resources for sources RxNorm and
> > SNOMEDCT_US
> > 
> > We were able to open the hsqldb using the hsql db gui and found out
> > that
> > some of our required entries are already there . So if I come
> > specifically
> > to our current problem. The  Pap Smear and Mamogram are two
> > clinical terms
> > which are not currently recognized by cTAKES in our profile.
> > 
> > •       If I look into the .script file , Pap Smear and
> > Mammogram/Mammography is already present in the .script file and in
> > the
> > respective tables. PFB a snapshot as below
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > But still this was not recogonised by cTAKES. I see there are some
> > filters
> > working on top of the available entries in dictionary(ctakes-gui
> > and
> > ctake-gui-res). Will that be because of these filters the tokens
> > are not
> > recognized as expected. Could you pls. share us what exactly these
> > filters
> > do. This will help us in future also when we are trying to add new
> > terms
> > into the dictionary
> > 
> > 
> > 
> > •       What are the steps to do if we need to add/edit entries
> > into the
> > existing dictionaries. I see we can add/edit the existing values in
> > .scripts files but  our primary doubt is if suppose I have a term
> > ‘xyz’ to
> > be added to dictionary how can I get the CUI and other values like
> > TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random
> > value
> > for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create
> > custom
> > bsv dictionaries but couldn’t see much documentation for it. Kindly
> > advise
> > which is the better option from the below 3.
> > 
> > 
> > 
> > o   Generate a custom dictionary using METAMORPHOSYS UML
> > installation
> > tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and
> > leverage the
> > full set of .rrf  files in the meta folder . Is this approach
> > better if the
> > entries to be populated are maximal?
> > 
> > o   Add/edit the available dictionary sno_rx_16ab and in that case
> > how to
> > provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT
> > and
> > PREFTERM. If the entries to be populated are minimal is this
> > approach would
> > be better?.
> > 
> > o   Use a custom bsv , in that case how should we add  values to
> > custom
> > bsv. Could you also provide a sample in that case.
> > 
> > I found a Metathesaurus browser in the below url , where I can
> > search for
> > the terms and get the CUI  and the respective source like
> > ICD/CPT/MDR. But
> > still I was unable to get the other required attributes to  be
> > populated
> > like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what
> > these
> > attributes signifies
> > 
> > 
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=b2kcCzr6Vio3aE1ixikQLVP6X2TILDeEEEHEQiCnE1Y%3D&amp;reserved=0
> > <
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=BNOwS9Bz2ajf0Z1Ig1KxvlVxBFzFe4jACN5NffZIF1g%3D&amp;reserved=0
> > 
> > Kindly advise us on how to proceed on this and correct us if we
> > went wrong
> > somewhere. This would be of great help for us
> > 
> > P.S : We comply with UMLS license
> > 
> > 
> > Thanks & Regards
> > 
> > Abad Ayyub
> > Vnet: 406170 | Cell : +91-9447379028
> > 
> > 
> > 
> > This e-mail and any files transmitted with it are for the sole use
> > of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply
> > to the
> > sender and destroy all copies of the original message. Any
> > unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > this email, and/or any action taken in reliance on the contents of
> > this
> > e-mail is strictly prohibited and may be unlawful. Where permitted
> > by
> > applicable law, this e-mail and other e-mail communications sent to
> > and
> > from Cognizant e-mail addresses may be monitored. This e-mail and
> > any files
> > transmitted with it are for the sole use of the intended
> > recipient(s) and
> > may contain confidential and privileged information. If you are not
> > the
> > intended recipient(s), please reply to the sender and destroy all
> > copies of
> > the original message. Any unauthorized review, use, disclosure,
> > dissemination, forwarding, printing or copying of this email,
> > and/or any
> > action taken in reliance on the contents of this e-mail is strictly
> > prohibited and may be unlawful. Where permitted by applicable law,
> > this
> > e-mail and other e-mail communications sent to and from Cognizant
> > e-mail
> > addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use
> > of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply
> > to the
> > sender and destroy all copies of the original message. Any
> > unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > this email, and/or any action taken in reliance on the contents of
> > this
> > e-mail is strictly prohibited and may be unlawful. Where permitted
> > by
> > applicable law, this e-mail and other e-mail communications sent to
> > and
> > from Cognizant e-mail addresses may be monitored. This e-mail and
> > any files
> > transmitted with it are for the sole use of the intended
> > recipient(s) and
> > may contain confidential and privileged information. If you are not
> > the
> > intended recipient(s), please reply to the sender and destroy all
> > copies of
> > the original message. Any unauthorized review, use, disclosure,
> > dissemination, forwarding, printing or copying of this email,
> > and/or any
> > action taken in reliance on the contents of this e-mail is strictly
> > prohibited and may be unlawful. Where permitted by applicable law,
> > this
> > e-mail and other e-mail communications sent to and from Cognizant
> > e-mail
> > addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use
> > of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply
> > to the
> > sender and destroy all copies of the original message. Any
> > unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > this email, and/or any action taken in reliance on the contents of
> > this
> > e-mail is strictly prohibited and may be unlawful. Where permitted
> > by
> > applicable law, this e-mail and other e-mail communications sent to
> > and
> > from Cognizant e-mail addresses may be monitored. This e-mail and
> > any files
> > transmitted with it are for the sole use of the intended
> > recipient(s) and
> > may contain confidential and privileged information. If you are not
> > the
> > intended recipient(s), please reply to the sender and destroy all
> > copies of
> > the original message. Any unauthorized review, use, disclosure,
> > dissemination, forwarding, printing or copying of this email,
> > and/or any
> > action taken in reliance on the contents of this e-mail is strictly
> > prohibited and may be unlawful. Where permitted by applicable law,
> > this
> > e-mail and other e-mail communications sent to and from Cognizant
> > e-mail
> > addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use
> > of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply
> > to the
> > sender and destroy all copies of the original message. Any
> > unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > this email, and/or any action taken in reliance on the contents of
> > this
> > e-mail is strictly prohibited and may be unlawful. Where permitted
> > by
> > applicable law, this e-mail and other e-mail communications sent to
> > and
> > from Cognizant e-mail addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use
> > of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply
> > to the
> > sender and destroy all copies of the original message. Any
> > unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > this email, and/or any action taken in reliance on the contents of
> > this
> > e-mail is strictly prohibited and may be unlawful. Where permitted
> > by
> > applicable law, this e-mail and other e-mail communications sent to
> > and
> > from Cognizant e-mail addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use
> > of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply
> > to the
> > sender and destroy all copies of the original message. Any
> > unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > this email, and/or any action taken in reliance on the contents of
> > this
> > e-mail is strictly prohibited and may be unlawful. Where permitted
> > by
> > applicable law, this e-mail and other e-mail communications sent to
> > and
> > from Cognizant e-mail addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use
> > of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply
> > to the
> > sender and destroy all copies of the original message. Any
> > unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > this email, and/or any action taken in reliance on the contents of
> > this
> > e-mail is strictly prohibited and may be unlawful. Where permitted
> > by
> > applicable law, this e-mail and other e-mail communications sent to
> > and
> > from Cognizant e-mail addresses may be monitored.
> >

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Reply via email to