Hi Justin, Kean,

Kean: cheers for some great answers!

Justin:
>> cTakes did not identify any words in any of these sentences:
>> "The patient fell."
>> "The patient fell again after using the restroom."
>> "The patient has not fallen since the afternoon."

My immediate thought is that Kean's assessment is correct:
>I'd think LVG would come up with "fall" as the canonicalForm of "fell" and 
>"fallen", but apparently it doesn't.
Justin, is LVG in your pipeline before dictionary lookup?  If not, try adding 
it and see if there is any difference.  

If lvg doesn't work, then I think that Kean's idea is a good one:
> Justin, for this particular purpose, you could create a custom BSV dictionary 
> associating "fell" and "fallen" with your target concept.

And now, this:
The  word "fell" is actually not in the umls, so you really do need lvg or a 
custom dictionary.  Check the entry for the cui that you want:
https://uts.nlm.nih.gov/metathesaurus.html#C0085639;0;1;CUI;2017AA;EXACT_MATCH;CUI;*;
And you will see that there is no atom (synonym) "fell".
You will see one instance of the synonym "fallen" ... but pay careful 
attention: it is for  the German language!  There is no English synonym 
"fallen".
The umls only has the English synonyms "fall" "falls" and "falling"

Or run a metathesaurus search for "fell" and you will get returns that do not 
apply to a loss in one's battle against gravity.  Since the word "fell" is not 
in the umls, it will not be in the ctakes default dictionary.  A search for 
"fallen" utilizes the German synonym.

>> cTakes picked up on these, but only because it matched "did".
>> The other problem is that cTakes seems to need "have a" or "had a" in 
>> front of "fall" in order to pick it up.
Can you explain how you arrived at this conclusion?  My guess is that the part 
of speech for "fall" is changing between one that is ignored for dictionary 
lookup and one that is used for dictionary lookup.  You can change what parts 
of speech are ignored, but gaining a true positive for "fall" may also create 
false positives for other terms.  Sometimes the tradeoff is worth it ... 

Sean


-----Original Message-----
From: Kean Kaufmann [mailto:k...@recordsone.com] 
Sent: Friday, July 14, 2017 3:18 PM
To: dev@ctakes.apache.org
Subject: Re: cTakes doesn't identify certain words like "fell" in clinical 
notes [EXTERNAL]

I'd think LVG would come up with "fall" as the canonicalForm of "fell" and 
"fallen", but apparently it doesn't.

The only terms associated with C0085639 in my custom-built dictionary are:

sql> select cui, tui, text, prefterm from cui_terms c join tui t on 
sql> t.cui =
> c.cui join prefterm p on p.cui = c.cui and cui=85639;
>   CUI  TUI  TEXT     PREFTERM
> -----  ---  -------  --------
> 85639   33  falling  Falls
> 85639   33  falls    Falls
> 85639   33  fall     Falls


Justin, for this particular purpose, you could create a custom BSV dictionary 
associating "fell" and "fallen" with your target concept.
Examples here:
https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_trunk_ctakes-2Ddictionary-2Dlookup-2Dfast-2Dres_src_main_resources_org_apache_ctakes_dictionary_lookup_fast_example_bsv_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=YHTQW8dvhneZWdGdvR7QsFdzUISAQESIr0A1XTIlW9A&s=1Rw7j5RAs3UYOwM7GWhPUPavNmaM956W0bSuAv5Hrcg&e=
 

Sean, does this seem right?  (I have some side questions on LVG, too, since you 
mention you don't use it in your pipelines... but I suppose those should be 
under separate cover.)

-Kean


On Fri, Jul 14, 2017 at 2:32 PM, Justin Brown <jb613...@gmail.com> wrote:

> I'm using the FastPipeline, and cTakes 4.0(dev version) to process a 
> note that contains sentences about falling. After the note is run 
> through the pipeline, I pick out only the IdentifiedAnnotations in the 
> note, the sentence that contain them, and the CUIS of the 
> IdentifiedAnnotation. Here are the results:
>
> cTakes did not identify any words in any of these sentences:
> "The patient fell."
> "The patient fell again after using the restroom."
> "The patient has not fallen since the afternoon."
>
> cTakes identified the keyword "fall" in all of these, including it's CUI.
> "The patient had a fall in the morning during a shift change."
> "The patient had a fall."
> "The patient had no fall."
> The patient did not have a fall"
>
> cTakes picked up on these, but only because it matched "did".
> "The patient didn't fall."
> "The patient did not fall."
>
> The first problem is, I need cTakes to identify "fell" and all other 
> versions of "fall", and give me it's CUI(which according to MetaMaps is:
> C0085639).
> The other problem is that cTakes seems to need "have a" or "had a" in 
> front of "fall" in order to pick it up.
>
> Here is the code I'm using to process notes:
>
> JCas jcas = JCasFactory.createJCas();
> jcas.setDocumentText(note);
> AggregateBuilder builder = new AggregateBuilder(); 
> builder.add(ClinicalPipelineFactory.getFastPipeline());
> SimplePipeline.runPipeline(jcas, 
> builder.createAggregateDescription());
>
> for (IdentifiedAnnotation entity : JCasUtil.select(jcas,
> IdentifiedAnnotation.class)) {
> ...
> }
>
> If there is a better way of processing notes and getting CUIs, please 
> let me know
>
> Thank You,
>
> Justin
>

Reply via email to