Paul asks:

> I am looking for a NLP to read pathology reports and extract cancer
> related site, histology, stage and any other DX/RX data available. In
> looking at CTakes, I have a few questions;
>
> - Is CTakes an appropriate tool to automate this task?

I wrote a commercial surgical-pathology coding module some years ago, and
could imagine doing it in cTAKES.
Here's my two cents to add to the wealth of information Peter has already
provided.
Best luck.

> Where can I find an "executive overview" (30,000 foot view) of how the
CTakes works?

As Peter said, there's a lot of documentation out there!
Videos here: https://ctakes.apache.org/tutorials.html
Key point: it's built on top of UIMA https://uima.apache.org/
which ingests and annotates data from any source, letting you mix, match
and create your own annotators to build chains of analyses.
The cTAKES value-adds include a clinical type system and a spiffy
dictionary (see below).

> My ignorance regarding NLP algorithms like CTakes is whether it is
keyword driven, or it is self learning.

cTAKES is *not* "self-learning"; you have to tell it exactly what
information you want to extract from where.

Pro: High precision; explainable; you won't get the right answer for the
wrong reason.
Con: Low recall; brittle; you may not get answers at all! If you're
processing unpredictable document formats from many different facilities,
it can be hard to generalize over them.

> I currently have a homegrown application which looks for keywords and
negation modifiers within a certain distance from the keywords

cTAKES can certainly help with that.

   -
*Keywords *cTAKES lets you use the NLM's UMLS Metathesaurus
   
<https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/index.html>,
   using the dictionary framework Peter mentioned:

   
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+Fast+Dictionary+Lookup
   These sources may be useful in building your custom dictionary:
      - the NCI Thesaurus:
      
https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/NCI/index.html
      - CPT, if you want codes from there:
      
https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/CPT/index.html
      -  For anatomy, I'm not familiar with the "anatomical site annotator"
      Peter alludes to, but the FMA is better structured than SNOMED:

      
https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/FMA/index.html
      - *Negation*
   Several annotators available:
   
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+Negation+Annotators
   Distance-from-keywords is a start, but sentence detection and shallow
   parsing both help.
   I like the ctakes-ytex-uima NegexAnnotator and SentenceDetector.
   -
*Document structure *I found header detection to be crucial in processing
   pathology reports:
   tracking specimens through a document, extracting tumor info from
   tables, etc.
   The cTAKES RegexSectionizer might work for you.

   
https://ctakes.apache.org/apidocs/4.0.0/org/apache/ctakes/core/ae/RegexSectionizer.html

_____________________________________________________
*Kean Kaufmann*
NLP Architect

RecordsOne
nSight Driven | *Priority. Clarity. Integrity. *


On Thu, Aug 10, 2023 at 1:06 PM Paul Stearns <pa...@compuace.com.invalid>
wrote:

> I am looking for a NLP to read pathology reports and extract cancer
> related site, histology, stage and any other DX/RX data available. In
> looking at CTakes, I have a few questions;
>
> - Is CTakes an appropriate tool to automate this task?
> - The end goal would be a fully automated tool where text was presented to
> an API and data was returned.
> - An added bonus, would be for the tool to annotate the text, so that a
> reviewer can more easily find the relevant data.
> - For someone with a strong IT/software development background, but no NLP
> background what is the level of difficulty in getting started with this
> product?
>
> Paul R. Stearns
> Advanced Consulting Enterprises, Inc.
> 15150 NW 79th Court,
> Suite: 206
> Miami Lakes Fl, 33016
>
> Voice: (305)623-0360 x107
> Fax: (305)623-4588
>

Reply via email to