Thanks Tim.

I've been experimenting with the PennTreebank and see some potential for
using it as a powerful disambiguation tool.  The complex part is to find a
heuristic that minimizes the number of cases where the "big guns"   need to
be brought in -- because, yes, it would really slow things down.

Peter

On Tue, Sep 15, 2020 at 12:54 PM Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> Peter,
> The parts of speech come from the ctakes-pos-tagger module, which uses
> the OpenNLP pos tagger trained on clinical data. There is a
> constituency parser as well, which I think in theory can tag even
> better (that might be able to get you a unary branch in a tree from NN
> -> CD -> <number>.), but is a lot slower than the pos tagger and we
> probably don't want to make it necessary to run for simple dictionary
> pipelines.
> Tim
>
> On Tue, 2020-09-15 at 12:12 -0700, Peter Abramowitsch wrote:
> > * External Email - Caution *
> >
> >
> > Sean this conversation raises for me a question that I've had for a
> > while.
> >  Does the term finding mechanism actually use a treebank to find the
> > POS or
> > does it use a another less rigorous approach.   If it were rigorous
> > wouldn't it be able to tag a pure number as an NN in the role
> > of  object if
> > it played the corresponding role in the sentence?
> >
> > I've not had the same problem as Ayyub,  but I have been wondering
> > why one
> > needed to disable the identification of cm as a genetic acronym
> > because of
> > situations where clearly cm is part of a unit of measure and would
> > show up
> > as an entity's modifier in a treebank.
> >
> > Does the question make sense?
> >
> > Peter
> >
> > On Tue, Sep 15, 2020, 9:02 AM Finan, Sean <
> > sean.fi...@childrens.harvard.edu>
> > wrote:
> >
> > > I should mention that going the Paragraph route would only impact
> > > term
> > > lookup.
> > > ________________________________________
> > > From: abad.ay...@cognizant.com <abad.ay...@cognizant.com>
> > > Sent: Tuesday, September 15, 2020 11:54 AM
> > > To: dev@ctakes.apache.org
> > > Subject: RE: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Thank you Sean for the response. We shall definitely try that way.
> > > I have
> > > one question on the "f84.1" problem, since we have now developed a
> > > lot of
> > > features based on the output from cTAKES, is the impact of changing
> > > the
> > > sentenceDetectorAnnotator going to be huge?
> > >
> > > Thanks & Regards
> > >
> > > Abad Ayyub
> > > Vnet: 406170 | Cell : +91-9447379028
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Finan, Sean <sean.fi...@childrens.harvard.edu>
> > > Sent: Tuesday, September 15, 2020 9:06 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES [EXTERNAL]
> > >
> > > [External]
> > >
> > >
> > > Hi Abad,
> > >
> > > The first thing that I would try for the "97112" problem is
> > > changing the
> > > parts of speech that are ignored for lookup.  Right now a pure
> > > number is
> > > ignored - it is not a word.  So, similar to what I said in my
> > > previous
> > > email, change the dictionary lookup parameter exclusionTags.  But
> > > to make
> > > sure that you get everything, you can first try no exclusions:
> > > set exclusionTags=""
> > >
> > > My guess with the F84.1 problem is that your sentence splitter is
> > > splitting "F84.1" but not splitting "F84 . 1".
> > >
> > > I think that the best way to start debugging is adding the
> > > PrettyTextWriter to the end of the piper and looking at its output
> > > (see my
> > > previous email).   It will print each sentence on a line and
> > > indicate the
> > > part of speech for each token.  If you can quickly and easily see
> > > what the
> > > system is doing then you might start to understand what needs to be
> > > changed
> > > to fit your data.
> > >
> > > Sean
> > > ________________________________________
> > > From: abad.ay...@cognizant.com <abad.ay...@cognizant.com>
> > > Sent: Tuesday, September 15, 2020 11:15 AM
> > > To: dev@ctakes.apache.org
> > > Subject: RE: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Thank you Sean for the detailed response.  I think there was
> > > miscommunication from our end with the requirement. Your solution
> > > of adding
> > > spaces between the entries worked but it required the input  text
> > > also to
> > > have the spaces. If the text comes in as 'F84.1' cTAKES didn't
> > > reckon the
> > > token but if the text came as 'F84 . 1' then cTAKES was recognizing
> > > the
> > > tokens for the below INSERT scripts.
> > >
> > > INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)
> > >
> > > But we encountered a similar issue when we configured an INSERT
> > > entry as
> > > below for CPT codes,
> > >
> > > INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)
> > >
> > > Where 97112 is a CPT code(which usually doesn’t have decimals or
> > > '.'). We
> > > expected cTAKES to recognize the CPT code '97112' as a separate
> > > token but
> > > it didn't. Could you pls. advise us on why this issue came up.
> > >
> > > Is there something wrong in the configuration. Do we need to have
> > > something additional for cTAKES to recognize the code alone as a
> > > separate
> > > token Is there any other way in which we can try to get the
> > > respective
> > > ICD/CPT code of the identified annotation from cTAKES, like
> > > querying the
> > > CPT/ICD table using the fetched CUI? Kindly advise.
> > >
> > >
> > > Thanks & Regards
> > >
> > > Abad Ayyub
> > > Vnet: 406170 | Cell : +91-9447379028
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Finan, Sean <sean.fi...@childrens.harvard.edu>
> > > Sent: Monday, September 14, 2020 9:35 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES [EXTERNAL]
> > >
> > > [External]
> > >
> > >
> > > Hi Abad,
> > >
> > >
> > > I think that you need to make only one minor change.
> > >
> > >
> > > ctakes uses "tokens" for identification and not the actual text.
> > > Tokenization turns text such as "F84.1" into "F84 . 1"  The first
> > > token
> > > being F84, followed by a token encompassing '.' and another with
> > > '1'.  The
> > > manner in which this is indicated in the .script file is by adding
> > > a space
> > > between each token.  This makes the full entry:
> > >
> > >
> > > INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)
> > >
> > >
> > > Notice that the token length is now 3 and the full text contains
> > > the
> > > between-token spaces.  This would carry forward for the other
> > > entries, such
> > > as:
> > >
> > >
> > > INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)
> > >
> > >
> > > Sean
> > >
> > >
> > > ________________________________
> > > From: abad.ay...@cognizant.com <abad.ay...@cognizant.com>
> > > Sent: Monday, September 14, 2020 11:32 AM
> > > To: dev@ctakes.apache.org
> > > Subject: RE: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Hi Team,
> > >
> > > I hope you all are doing good. With your support ,We were able to
> > > successfully add our required synonyms into existing dictionary and
> > > could
> > > see that it was getting successfully picked up by cTAKES. Now we
> > > have a
> > > requirement to configure the ICD and CPT also, where we followed
> > > the steps
> > > as mentioned in cTAKES wiki and generated the respective .script
> > > file.
> > >
> > > The newly created dictionary which comprises of
> > > SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as
> > > expected
> > > but we have a requirement to extract the ICD code for the
> > > respective
> > > description . so the scenario would be like for a text like below
> > >
> > > ‘F84.1 pervasive developmental disorders’
> > >
> > > We would need cTAKES to reckon F84.1 as a token or at least as an
> > > attribute in any of the ‘IdentifiedAnnotation’. So for achieving
> > > the same
> > > based on our prior experience we tried to tweak the dictionary
> > > where we
> > > added a synonym for the existing CUI as below
> > >
> > > INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive
> > > developmental
> > > disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2,
> > > ‘F84.1
> > > pdd’, ‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1,
> > > ‘F84.1’,’F84.1’)
> > >
> > > Though we have seen cTAKES can identify ‘F84’ alone as a token but
> > > it
> > > won’t consider whenever a ‘.’ Has been encountered. As an end
> > > result cTAKES
> > > won’t be able to give the ICD codes like F84.1,M25.6 as separate
> > > tokens.
> > > Since almost all of the ICD codes have  a ‘.’ Associated with it,
> > > this way
> > > of tweaking the dictionary is not working. Infact cTAKES is
> > > recognizing the
> > > digit after decimal within the ‘FractionAnnotation’
> > >
> > > Does cTAKES have the capability to return the code like ICD code
> > > while
> > > retrieving  the token as an individual token or as an attribute in
> > > any of
> > > the tokens
> > >
> > > Is there any other way in which the dictionary can be tweaked , so
> > > that a
> > > synonym addition as below will recognize the ICD code as a token
> > > and will
> > > be returned from cTAKES
> > >
> > > INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)
> > >
> > >
> > > Kindly check and advise us on how to proceed on this situation
> > >
> > > Thanks & Regards
> > > [cid:D3145E69-CD94-48C1-877F-5134EEAFB598]
> > >
> > > Abad Ayyub
> > > Vnet: 406170 | Cell : +91-9447379028
> > >
> > >
> > >
> > > From: Remy Sanouillet <re...@foreseemed.com>
> > > Sent: Tuesday, June 2, 2020 7:23 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES
> > >
> > > [External]
> > > Hi Abad,
> > >
> > > •       How can we point cTAKES application to multiple
> > > dictionaries.
> > > Currently only sno_rx_16ab is pointed to the application, how can I
> > > tweak
> > > it to point that to multiple dictionary simultaneously. Or you
> > > meant to say
> > > create a fresh dictionary with all the vocabularies and point just
> > > that in
> > > cTAKES.
> > >
> > > If you go back in the archive a bit, you should find a thread where
> > > I went
> > > into detail on how to add multiple dictionaries. Combining all
> > > dictionaries
> > > into a fresh dictionary is not recommended for obvious reasons. If
> > > you
> > > can't find the thread, I will dig it up.
> > >
> > > •       So for these edits I will have to add INSERT queries to
> > > respective
> > > tables in the sno_rx_16ab.script file right? Do I need to make any
> > > more
> > > changes for these tokens to get reflected in cTAKES.
> > >
> > > Nope! That is all that is needed and next time you launch cTakes,
> > > it
> > > should recognize your new entries.
> > >
> > > •       If it is a non-existing CUI , I can get the respective
> > > CUI,TUI
> > > from here
> > >
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711644893&amp;sdata=hXu2kXG4Xt%2Bw2kh61fAPVD0FRW25XcZWhcRAJtIGkf0%3D&amp;reserved=0
> > > <
> > >
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3DAks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=vBFcrxWI0hFUqB%2B1s0F%2FWqPN%2F%2BNFTXm4pCaJB16qCfI%3D&amp;reserved=0
> >
> > > right?
> > >
> > > Correct! Remember that the ontology has multiple-inheritance so you
> > > need
> > > to grab all the TUIs for a given CUI.
> > >
> > > •       Based on the source I will have to add entry to respective
> > > table
> > > right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either
> > > one of it
> > > and not in all. Correct me if am wrong on this understanding
> > >
> > > That is also correct. And most of the time, the dictionaries only
> > > contain
> > > one CODE table so it is not even a question. However, sno_rx_16ab
> > > is an
> > > exception with both a CODE table for SNOMEDCT_US and RXNORM. They
> > > mostly do
> > > not overlap. I do remember that there were a couple of exceptions
> > > but, in
> > > the case where that happens, the metathesaurus will show it.
> > > For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes
> > > (372815001 and 68088000) *and* an RXNORM of 149.
> > >
> > > •       PREFTERM table will be having only one entry for each CUI
> > > right?
> > > Basically it’s a one-to-one mapping between CUI and PREFTERM .
> > > Correct me
> > > if am wrong on this understanding.
> > >
> > > You are correct here also. It is a one-to-one mapping although the
> > > system
> > > appears to tolerate when the PREFTERM is missing.
> > >
> > > Rémy Sanouillet
> > > NLP Engineer
> > > re...@foreseemed.com<mailto:xx...@foreseemed.com>
> > >
> > >
> > > [image.png]
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ForeSee Medical, Inc.
> > > 12555 High Bluff Drive, Suite 100
> > > San Diego, CA 92130
> > >
> > > NOTICE: This e-mail message and all attachments transmitted with it
> > > are
> > > intended solely for the use of the addressee and may contain
> > > legally
> > > privileged and confidential information. If the reader of this
> > > message is
> > > not the intended recipient, or an employee or agent responsible for
> > > delivering this message to the intended recipient, you are hereby
> > > notified
> > > that any dissemination, distribution, copying, or other use of this
> > > message
> > > or its attachments is strictly prohibited. If you have received
> > > this
> > > message in error, please notify the sender immediately by replying
> > > to this
> > > message and please delete it from your computer.
> > >
> > >
> > > On Mon, Jun 1, 2020 at 7:56 AM <abad.ay...@cognizant.com<mailto:
> > > abad.ay...@cognizant.com>> wrote:
> > > Thank you Remy and Peter for your responses. I hope you guys are
> > > doing
> > > good and safe in this lock down period. Could you pls. help me on
> > > my below
> > > queries in creating an additional dictionary.
> > >
> > >
> > > •       How to create additional dictionary. You meant to say using
> > > the
> > > UMLS tool , so that using that tool we create .script files from
> > > .RRF files?
> > >
> > > •       How can we point cTAKES application to multiple
> > > dictionaries.
> > > Currently only sno_rx_16ab is pointed to the application, how can I
> > > tweak
> > > it to point that to multiple dictionary simultaneously. Or you
> > > meant to say
> > > create a fresh dictionary with all the vocabularies and point just
> > > that in
> > > cTAKES.
> > >
> > > I hope Remy was explaining editing the existing dictionary where I
> > > would
> > > deal with two scenarios where one was with existing CUI and other
> > > was with
> > > Non-existing CUI. Could you pls. resolve the below queries
> > > regarding the
> > > same.
> > >
> > >
> > > •       So for these edits I will have to add INSERT queries to
> > > respective
> > > tables in the sno_rx_16ab.script file right? Do I need to make any
> > > more
> > > changes for these tokens to get reflected in cTAKES.
> > >
> > > •       If it is a non-existing CUI , I can get the respective
> > > CUI,TUI
> > > from here
> > >
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=5ffFqKOHKUDW8hrOw2%2Ftbg%2FumJa%2FbE%2B7oB84PMgUAbo%3D&amp;reserved=0
> > > <
> > >
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=UR%2F623xDp4qXTS7p%2BRxux0I0CN4w0rtyd4a13RxIMuU%3D&amp;reserved=0
> >
> > > right?
> > >
> > > •       Based on the source I will have to add entry to respective
> > > table
> > > right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either
> > > one of it
> > > and not in all. Correct me if am wrong on this understanding
> > >
> > > •       PREFTERM table will be having only one entry for each CUI
> > > right?
> > > Basically it’s a one-to-one mapping between CUI and PREFTERM .
> > > Correct me
> > > if am wrong on this understanding.
> > >
> > >
> > > Thanks & Regards
> > >
> > > Abad Ayyub
> > > Vnet: 406170 | Cell : +91-9447379028
> > >
> > >
> > >
> > > From: Remy Sanouillet <re...@foreseemed.com<mailto:
> > > re...@foreseemed.com>>
> > > Sent: Friday, May 29, 2020 9:25 PM
> > > To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> > > Cc: u...@ctakes.apache.org<mailto:u...@ctakes.apache.org>
> > > Subject: Re: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES
> > >
> > > [External]
> > > Hello Abad,
> > >
> > > The short answer is, yes, the sno_rx_16ab can be "hacked". A couple
> > > of
> > > caveats are that any mistake can stop all recognition and you will
> > > lose all
> > > your mods on updates. So an additional dictionary is a recommended
> > > approach.
> > >
> > > There are two cases. EIther the CUI you are adding already exists
> > > and you
> > > are just adding a synonym. In that case, you only need to add one
> > > line:
> > > INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
> > > where:
> > >
> > >   *   CUI is the cui, nuf'said
> > >   *   TEXT is the tokenized lowercase string for the entry. In your
> > > case
> > > 'pap smear'. Most punctuation is a separate token. Single quotes
> > > are
> > > escaped by doubling them
> > >   *   RWORD is the one token in TEXT that is the most indicative
> > > (least
> > > common) which will be used as the index in the lookup. In your case
> > > probably 'pap' since it is not as common as 'smear'
> > >   *   RINDEX is the index of RWORD in TEXT. First token is 0 which
> > > is the
> > > case for 'pap'
> > >   *   TCOUNT is the token count for TEXT. In your case, 2
> > > So you would want to add:
> > > INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')
> > >
> > >  If the entry is a non-existing one, you will need to add a few
> > > more
> > > lines. Their positions are unimportant as long as they are below
> > > the header
> > > lines (below the final "SET SCHEMA PUBLIC" line).
> > >
> > >   1.  INSERT INTO TUI VALUES(CUI,TUI)
> > > One line for each TUI in the taxonomy
> > >   2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) assuming you are
> > > adding a
> > > SNOMED
> > >   3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM) where PREFTERM is
> > > the
> > > pretty string to describe the entry. It need not correspond to any
> > > indexed
> > > entry. It is used for display once the lookup has been successful.
> > > That's it. Use at your own discretion. No guarantees.
> > >
> > >
> > > Rémy Sanouillet
> > > NLP Engineer
> > > re...@foreseemed.com<mailto:xx...@foreseemed.com>
> > >
> > >
> > >
> > > ForeSee Medical, Inc.
> > > 12555 High Bluff Drive, Suite 100
> > > San Diego, CA 92130
> > >
> > > NOTICE: This e-mail message and all attachments transmitted with it
> > > are
> > > intended solely for the use of the addressee and may contain
> > > legally
> > > privileged and confidential information. If the reader of this
> > > message is
> > > not the intended recipient, or an employee or agent responsible for
> > > delivering this message to the intended recipient, you are hereby
> > > notified
> > > that any dissemination, distribution, copying, or other use of this
> > > message
> > > or its attachments is strictly prohibited. If you have received
> > > this
> > > message in error, please notify the sender immediately by replying
> > > to this
> > > message and please delete it from your computer.
> > >
> > >
> > > On Fri, May 29, 2020 at 7:34 AM <abad.ay...@cognizant.com<mailto:
> > > abad.ay...@cognizant.com>> wrote:
> > > Hi Team,
> > >
> > > We set up cTAKES4.0.0 as our NLP engine for our profile recently .
> > > We have
> > > faced situations where some of the expected tokens are not picked
> > > up by
> > > cTAKES during clinical text extraction. So our first thought
> > > process was to
> > > identify where the dictionary is configured and how that can be
> > > updated.
> > > After some code analysis  it was found that the dictionary is
> > > configured in
> > > the  below path under ctakes/resources for sources RxNorm and
> > > SNOMEDCT_US
> > >
> > > We were able to open the hsqldb using the hsql db gui and found out
> > > that
> > > some of our required entries are already there . So if I come
> > > specifically
> > > to our current problem. The  Pap Smear and Mamogram are two
> > > clinical terms
> > > which are not currently recognized by cTAKES in our profile.
> > >
> > > •       If I look into the .script file , Pap Smear and
> > > Mammogram/Mammography is already present in the .script file and in
> > > the
> > > respective tables. PFB a snapshot as below
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > But still this was not recogonised by cTAKES. I see there are some
> > > filters
> > > working on top of the available entries in dictionary(ctakes-gui
> > > and
> > > ctake-gui-res). Will that be because of these filters the tokens
> > > are not
> > > recognized as expected. Could you pls. share us what exactly these
> > > filters
> > > do. This will help us in future also when we are trying to add new
> > > terms
> > > into the dictionary
> > >
> > >
> > >
> > > •       What are the steps to do if we need to add/edit entries
> > > into the
> > > existing dictionaries. I see we can add/edit the existing values in
> > > .scripts files but  our primary doubt is if suppose I have a term
> > > ‘xyz’ to
> > > be added to dictionary how can I get the CUI and other values like
> > > TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random
> > > value
> > > for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create
> > > custom
> > > bsv dictionaries but couldn’t see much documentation for it. Kindly
> > > advise
> > > which is the better option from the below 3.
> > >
> > >
> > >
> > > o   Generate a custom dictionary using METAMORPHOSYS UML
> > > installation
> > > tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and
> > > leverage the
> > > full set of .rrf  files in the meta folder . Is this approach
> > > better if the
> > > entries to be populated are maximal?
> > >
> > > o   Add/edit the available dictionary sno_rx_16ab and in that case
> > > how to
> > > provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT
> > > and
> > > PREFTERM. If the entries to be populated are minimal is this
> > > approach would
> > > be better?.
> > >
> > > o   Use a custom bsv , in that case how should we add  values to
> > > custom
> > > bsv. Could you also provide a sample in that case.
> > >
> > > I found a Metathesaurus browser in the below url , where I can
> > > search for
> > > the terms and get the CUI  and the respective source like
> > > ICD/CPT/MDR. But
> > > still I was unable to get the other required attributes to  be
> > > populated
> > > like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what
> > > these
> > > attributes signifies
> > >
> > >
> > >
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=b2kcCzr6Vio3aE1ixikQLVP6X2TILDeEEEHEQiCnE1Y%3D&amp;reserved=0
> > > <
> > >
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=BNOwS9Bz2ajf0Z1Ig1KxvlVxBFzFe4jACN5NffZIF1g%3D&amp;reserved=0
> > >
> > > Kindly advise us on how to proceed on this and correct us if we
> > > went wrong
> > > somewhere. This would be of great help for us
> > >
> > > P.S : We comply with UMLS license
> > >
> > >
> > > Thanks & Regards
> > >
> > > Abad Ayyub
> > > Vnet: 406170 | Cell : +91-9447379028
> > >
> > >
> > >
> > > This e-mail and any files transmitted with it are for the sole use
> > > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > > to the
> > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> > > by
> > > applicable law, this e-mail and other e-mail communications sent to
> > > and
> > > from Cognizant e-mail addresses may be monitored. This e-mail and
> > > any files
> > > transmitted with it are for the sole use of the intended
> > > recipient(s) and
> > > may contain confidential and privileged information. If you are not
> > > the
> > > intended recipient(s), please reply to the sender and destroy all
> > > copies of
> > > the original message. Any unauthorized review, use, disclosure,
> > > dissemination, forwarding, printing or copying of this email,
> > > and/or any
> > > action taken in reliance on the contents of this e-mail is strictly
> > > prohibited and may be unlawful. Where permitted by applicable law,
> > > this
> > > e-mail and other e-mail communications sent to and from Cognizant
> > > e-mail
> > > addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole use
> > > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > > to the
> > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> > > by
> > > applicable law, this e-mail and other e-mail communications sent to
> > > and
> > > from Cognizant e-mail addresses may be monitored. This e-mail and
> > > any files
> > > transmitted with it are for the sole use of the intended
> > > recipient(s) and
> > > may contain confidential and privileged information. If you are not
> > > the
> > > intended recipient(s), please reply to the sender and destroy all
> > > copies of
> > > the original message. Any unauthorized review, use, disclosure,
> > > dissemination, forwarding, printing or copying of this email,
> > > and/or any
> > > action taken in reliance on the contents of this e-mail is strictly
> > > prohibited and may be unlawful. Where permitted by applicable law,
> > > this
> > > e-mail and other e-mail communications sent to and from Cognizant
> > > e-mail
> > > addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole use
> > > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > > to the
> > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> > > by
> > > applicable law, this e-mail and other e-mail communications sent to
> > > and
> > > from Cognizant e-mail addresses may be monitored. This e-mail and
> > > any files
> > > transmitted with it are for the sole use of the intended
> > > recipient(s) and
> > > may contain confidential and privileged information. If you are not
> > > the
> > > intended recipient(s), please reply to the sender and destroy all
> > > copies of
> > > the original message. Any unauthorized review, use, disclosure,
> > > dissemination, forwarding, printing or copying of this email,
> > > and/or any
> > > action taken in reliance on the contents of this e-mail is strictly
> > > prohibited and may be unlawful. Where permitted by applicable law,
> > > this
> > > e-mail and other e-mail communications sent to and from Cognizant
> > > e-mail
> > > addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole use
> > > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > > to the
> > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> > > by
> > > applicable law, this e-mail and other e-mail communications sent to
> > > and
> > > from Cognizant e-mail addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole use
> > > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > > to the
> > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> > > by
> > > applicable law, this e-mail and other e-mail communications sent to
> > > and
> > > from Cognizant e-mail addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole use
> > > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > > to the
> > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> > > by
> > > applicable law, this e-mail and other e-mail communications sent to
> > > and
> > > from Cognizant e-mail addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole use
> > > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > > to the
> > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> > > by
> > > applicable law, this e-mail and other e-mail communications sent to
> > > and
> > > from Cognizant e-mail addresses may be monitored.
> > >
>

Reply via email to