While it is doable, it will need some non trivial post processing. The
approach I suggest below is just an example, there are many ways to
achieve this, but there is no silver bullet.

To do something like that I suggest incorporating a TokensRegex analysis
engine in your pipeline.  I have had a lot of success with
https://github.com/JuleStar/uima-tokens-regex

These allow you to combine standard string based Regex with expressions on
properties of Annotations - a MetaRegex.  They allow you to choose the
AnnotationType you prefer to operate with.  (Stanford's TokensRegex for
CoreNLP is even more powerful)

Write TokensRegex rules that look for ConllDep nodes whose text is like
clinic/visit/specialist/referral.. Whatever you are searching for, and
assign a unique tag to that token.  Let's say you name the tag CLINIC.
It's a custom NER, basically

Output your CAS object and start processing here:

Scan the ConllDep tokens of your document looking for one with the new tag
CLINIC 

If you find one, Now find the sentence boundary around this Token, using
the Sentence Annotations.

Then use the POS attribute of all the ConllDep tokens within that Sentence
boundary to look for a modifier token(POS=JJ) to the token(POS=NN) that
you tagged

Now look through the DiseaseDisorderMentions and ProcedureMentions for a
token whose offsets matches the offsets of your JJ ConllDep token.  If you
have a hit, then you can use it to find the core SNOMED code for Headache
Clinic, Epilepsy Clinic, Dialysis Clinic etc.   Once you have this you
will need to manually add the post coordinations to the SNOMED ref pointed
to by the "(Disease|Procedure)Mention" token.  You can elaborate on this
theme to capture more complex cases where the modifier is expressed
differently or is not adjacent to the "CLINIC" token.

I created a framework in Ruby to post process a CAS in this way, although
I never went as far as generating SNOMED modifiers as they weren't needed
in my case.  If not Ruby, use some other language that allows efficient
manipulation of complex data structures in a very few lines of code.
Otherwise it will get ugly fast.


On 10/21/16, 3:03 AM, "Finan, Sean" <sean.fi...@childrens.harvard.edu>
wrote:

>Hi Arron,
>
>
>
>Ctakes discovers text words and phrases by lookup using a subset of the
>UMLS 
>https://urldefense.proofpoint.com/v2/url?u=https-3A__uts.nlm.nih.gov_home.
>html&d=DQIGaQ&c=B73tqXN8Ec0ocRmZHMCntw&r=5LM1YwNyMUq7CWiSepCCsjTjwuVF4uswN
>F8BK5Orm10&m=eJEOUMzoBPBjZxm8a4k4cdGeAH1SrTXyQMdrocZGEiM&s=QambLzUt8R0dB1k
>VhZJzZukV-whlMVbMI82LvtmFkyU&e=     ctakes then assigns a code to
>everything that it finds.
>
>
>
>While you can employ various workarounds to remove "epilepsy" in when
>within "epilepsy clinic", these are not part of the standard ctakes
>distribution or workflow.
>
>
>
>Sean
>
>
>
>-----Original Message-----
>
>From: Lacey A.S. [mailto:a.s.la...@swansea.ac.uk]
>
>Sent: Thursday, October 20, 2016 6:56 PM
>
>To: dev@ctakes.apache.org
>
>Subject: Post co-ordinated SNOMED-CT with
>AggregatePlaintextFastUMLSProcessor
>
>
>
>Hi,
>
>
>
>Just wondering if someone could point me in the direction of how ctakes
>produces post coordinated SNOMED-CT? Using the
>AggregatePlaintextFastUMLSProcessor the individual concepts come out
>write nicely, however if you take the following phrase "I went to the
>Epilepsy Clinic", I can't see how the final pay coordinated SNOMED
>concepts are formed, and appears I have a list of sub concepts
>(pre-coordinated) that includes the disorder epilepsy (which merely going
>to the clinic would not confirm this.
>
>
>
>Any help would be great thanks - enjoying working with ctakes and hoping
>to include it in an NLP paper on some UK healthcare data.
>
>
>
>Arron Lacey
>
>Research Analyst
>
>SAIL Databank
>
>Swansea Neuroscience Research Group
>
>01792 602023
>
>a.s.la...@swansea.ac.uk
>
>
>

Reply via email to