RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

Gandhi Rajan Natarajan Thu, 28 Sep 2017 23:26:13 -0700

Hi Sean,

Thanks again for the response. I guess its mistake from my side that I dint 
send the complete text. Did you mean that with the text I sent, the 
co-reference superscript-1 will be lost?


Also as per your advice, We have created an issue  - 
https://issues.apache.org/jira/browse/CTAKES-459  for measurement FSM changes 
and attached the modified file changes. Could someone have a look and know your 
thoughts please?

Regards,
Gandhi


-----Original Message-----
From: Finan, Sean [mailto:[email protected]]
Sent: Thursday, September 28, 2017 8:21 PM
To: [email protected]
Cc: Miller, Timothy <[email protected]>
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

I don't recall you sending me that entire snippet of text.  I think that I only 
had your single example sentence.
You have discovered one of the quirks of software: "change the data, change the 
result."
Ctakes is a system with many moving parts.  Things that precede or follow your 
original example sentence will change the evaluation of that sentence.
With the pipeline you are using and the full note, you should see a number 
(mine is 4) next to the first "thalomid" in the original example sentence.  If 
you click that number you should see (to the right) 4 instances of "thalomid".
Tim can correct me here, but maybe the coreference module ranked the links 
between "thalomid" as much higher than the rank between "study treatment of 
thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded 
the encapsulating treatment texts from markables?  It is probably more complex 
than that.

> we have also made some code changes in MeasurementFSM.java to identify 
> certain measurements like '20 mg/m2' which was not identified out of the box. 
>  Should we send the code changes to you so that you can consider the same to 
> be productized ? Please advise."

I don't know if you've noticed the recent emails on the dev list involving 
Alexandru Zbarcea.  Alex has been creating or commenting on Jira items and 
attaching code for  fixes and enhancements.  This is a widely used process and 
is fairly easy to follow.   I think that the following links are relevant:
Working with issues:  
https://confluence.atlassian.com/jiracoreserver073/working-with-issues-861257307.html
Creating patches:   
https://confluence.atlassian.com/crucible/creating-patch-files-for-pre-commit-reviews-298977458.html
Attaching files:   
https://confluence.atlassian.com/jiracorecloud/attaching-files-and-screenshots-to-issues-765593805.html

I don't know if you have a jira account and permissions for the ctakes project. 
 An administrator may need to set that up for you.

Thanks,
Sean

-----Original Message-----
From: Gandhi Rajan Natarajan [mailto:[email protected]]
Sent: Thursday, September 28, 2017 4:09 AM
To: [email protected]
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,

Thanks for the response. I was able to see the co-reference superscript using 
the html file that you sent. Interestingly even I was able to generate the 
sample HTML using  piper GUI by  having only that single line - " The patient 
started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 
(days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. 
" in the input file.

But when I change the input file content with the following lines:

"This patient is participating in a Non-IND study; Protocol CG-000424: "Phase 
I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic 
Hepatocellular Carcinoma".Information has been received from the investigator 
regarding an 82 year-old male patient who had gastrointestinal bleeding while 
on Thalomid, Epirubicin, and Coumadin. He had a past medical history of 
diverticulosis in 03/02 and a right atrial clot from intraventricular catheter 
(IVC) for which he was started on Coumadin. During the hospitalization for a 
right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was 
referred to an oncologist.  The patient started study treatment of Thalomid 
200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for 
the treatment of hepatocellular carcinoma.  He was concomitantly receiving 
Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient 
presented to the emergency room with the chief complaint of hematochezia. He 
reported noticing bright red blood and small clots mixed in with his stool. On 
07/13/02, he was admitted due to gastrointestinal bleed.  The physician ordered 
2 large bore intravenous lines and planned to transfuse for hematocrit less 
than 30%. Due to the  INR (international normalized ratio) level of 3.0, 
Coumadin was held. He was also noted to have bilateral lower extremity edema 
with dyspnea on exertion.  On 07/13/02, he had a chest X-ray PA and lateral 
done that showed no evidence of acute pneumonia or congestive heart failure.  
On 07/14/02, he underwent  an ultrasound which was negative for deep vein 
thrombosis. This patient did not take Thalomid on the day of his admittance to 
the hospital, but resumed treatment shortly after with no return of symptoms. 
On 07/15/02, he was discharged in stable condition. There have been no further 
reports of bleeding at this time. Thedoctor has assessed the hematochezia as 
related to Coumadin treatment and previously diagnosed diverticulosis, and not 
to protocol therapy with Thalomid and Epirubicin.Additional information 
received from the investigator on 27Aug02 reveals that this male patient began 
on 07Jun02 two cycles of therapy with Thalidomide and Epirubicin.  His post 
cycle two computed tomography scans revealed increase in size of liver lesion 
with development of multiple new satellite nodules.  On 29Jul02, the 
investigator removed this patient from protocol for progressive disease and 
recommended hospice care.  After seeking a second opinion from two other 
institutions, this patient was admitted to hospice on 05Aug02.  On 20Aug02, the 
investigator noted that this patient was suffering worsening fatigue and got 
tired getting out of his chair.  On 25Aug02, this patient died due to disease 
progression.  The investigator assessed the death as not related to study 
treatment and expected"

The co-reference superscript is lost by then. Did you tried with the complete 
text above by any chance in your piper GUI? Also I guess you did not notice the 
question on my last post - " Sean, we have also made some code changes in 
MeasurementFSM.java to identify certain measurements like '20 mg/m2' which was 
not identified out of the box.  Should we send the code changes to you so that 
you can consider the same to be productized ? Please advise."


Regards,
Gandhi


-----Original Message-----
From: Finan, Sean [mailto:[email protected]]
Sent: Wednesday, September 27, 2017 5:53 PM
To: [email protected]
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

I am glad that you are feeling better.
I don't understand why you aren't getting the same output as me.  I just ran 
your example sentence with your piper with a fresh checkout and get the html 
below.  The css follows.  Copy and paste into a file and see if you see the 
corefs.

/////////////////////////////////////////////////////  html, copy into file  
/////////////////////////////////////////////////

<!DOCTYPE html>
<html>
<head>
  <title>OneLiner Output</title>
</head>
<body>
<link rel="stylesheet" href="ctakes.pretty.css" type="text/css" media="screen"> 
<h2>OneLiner</h2>  <i>Text processing finished on: 9 27 2017, 08:15:31</i> <hr>

<div id="content">

<p>
The patient <span class="AFF_" 
onClick="iaf('AFF_NL_EVTNL_startedNL_SPC_[before] doc timeNL_NL_')" TIP="Event 
">started</span> study <span class="AFF_" 
onClick="iaf('AFF_NL_EVTNL_treatmentNL_SPC_[before] doc 
timeNL_NL_PRCNL_treatmentNL_SPC_C0087111NL_SPC_[Therapeutic 
procedure]NL_SPC_[before] doc timeNL_NL_')" TIP="Event Procedure 
">treatment</span><span class="PRC"><sup>&bull;</sup></span> of <span 
class="AFF_" onClick="iaf('AFF_NL_DRGNL_ThalomidNL_SPC_C0723668NL_SPC_[before] 
doc timeNL_NL_')" TIP="Drug ">Thalomid</span><span 
class="DRG"><sup>&bull;</sup></span> <span class="AFF_" 
onClick="iaf('AFF_NL_EVTNL_200mgNL_SPC_[before] doc timeNL_NL_')" TIP="Event 
">200mg</span><span class="UNK" onClick="crf1()"><sup>1</sup></span> ( <span 
class="GNR_" onClick="iaf('GNR_NL_TMXNL_daysNL_NL_')" TIP="Time ">days</span> 1 
- 21 ) , and <span class="AFF_" 
onClick="iaf('AFF_NL_DRGNL_EpirubicinNL_SPC_C0014582NL_SPC_[before] doc 
timeNL_NL_')" TIP="Drug ">Epirubicin</span><span 
class="DRG"><sup>&bull;</sup></span> , 20 mg / m2 ( <span class="GNR_" 
onClick="iaf('GNR_NL_TMXNL_days 1 , 8NL_NL_')" TIP="Time ">days 1 , 8</span> , 
and 15 ) on <span class="GNR_" onClick="iaf('GNR_NL_TMXNL_06 / 07 / 
02NL_SPC_[CONTAINS] treatmentNL_NL_')" TIP="Time ">06 / 07 / 02</span> for the 
<span class="AFF_" onClick="iaf('AFF_NL_EVTNL_treatmentNL_SPC_[before] doc 
timeNL_SPC_06 / 07 / 02 
[CONTAINS]NL_NL_PRCNL_treatmentNL_SPC_C0087111NL_SPC_[Therapeutic 
procedure]NL_SPC_[before] doc timeNL_NL_')" TIP="Event Procedure 
">treatment</span><span class="PRC"><sup>&bull;</sup></span> of <span 
class="AFF_" onClick="iaf('AFF_NL_DISNL_hepatocellular 
carcinomaNL_SPC_C2239176NL_SPC_[Liver carcinoma]NL_SPC_[before] doc 
timeNL_NL_')" TIP="Disorder ">hepatocellular </span><span class="AFF_" 
onClick="iaf('AFF_NL_DISNL_hepatocellular carcinomaNL_SPC_C2239176NL_SPC_[Liver 
carcinoma]NL_SPC_[before] doc timeNL_NL_EVTNL_carcinomaNL_SPC_[before] doc 
timeNL_NL_')" TIP="Disorder Event ">carcinoma</span><span class="DIS" 
onClick="crf1()"><sup>1</sup></span> .
<br>

</p>

</div>

<div id="ia"> Annotation Information </div> <script type="text/javascript">
  function iaf(txt) {
    var aff=txt.replace( /AFF_/g,"<br><h3>Affirmed</h3>" );
    var neg=aff.replace( /NEG_/g,"<br><h3>Negated</h3>" );
    var unc=neg.replace( /UNC_/g,"<br><h3>Uncertain</h3>" );
    var unn=unc.replace( /UNN_/g,"<br><h3>Uncertain, Negated</h3>" );
    var ant=unn.replace( /ANT/g,"<b>Anatomical Site</b>" );
    var dis=ant.replace( /DIS/g,"<b>Disease/ Disorder</b>" );
    var fnd=dis.replace( /FND/g,"<b>Sign/ Symptom</b>" );
    var prc=fnd.replace( /PRC/g,"<b>Procedure</b>" );
    var drg=prc.replace( /DRG/g,"<b>Medication</b>" );
    var evt=drg.replace( /EVT/g,"<b>Event</b>" );
    var tmx=evt.replace( /TMX/g,"<b>Time</b>" );
    var unk=tmx.replace( /UNK/g,"<b>Unknown</b>" );
    var spc=unk.replace( /SPC_/g,"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;" );
    var prf1=spc.replace( /\[/g,"<i>" );
    var prf2=prf1.replace( /\]/g,"</i>" );
    var nl=prf2.replace( /NL_/g,"<br>" );
    document.getElementById("ia").innerHTML = nl;
  }
  function crf1() {
    document.getElementById("ia").innerHTML = "<br><h3>Coreference 
Chain</h3>study treatment of Thalomid 200mg<br>the treatment of hepatocellular 
carcinoma";
  }
</script></body>
</html>



/////////////////////////////////////////////////////  css, copy into file 
named ctakes.pretty.css in same directory as html   
/////////////////////////////////////////////////



.GNR_ {
  position: relative;
  display: inline-block gray;
  border-bottom: 0.10em solid gray;
}

.AFF_ {
  position: relative;
  display: inline-block green;
  border-bottom: 0.15em solid green;
}

.UNC_ {
  position: relative;
  display: inline-block gold;
  border-bottom: 0.16em dotted gold;
}

.NEG_ {
  position: relative;
  display: inline-block red;
  border-bottom: 0.16em dashed red;
}

.UNN_ {
  position: relative;
  display: inline-block orange;
  border-bottom: 0.16em dashed orange;
}

.FND {
  color: magenta;
}

.DIS {
  color: black;
}

.DRG {
  color: red;
}

.PRC {
  color: blue;
}

.ANT {
  color: gray;
}

.UNK {
  color: gray;
}

[TIP] {
  position: relative;
  z-index: 2;
  cursor: pointer;
}
[TIP]::before,
[TIP]::after {
  visibility: hidden;
  -ms-filter: "progid:DXImageTransform.Microsoft.Alpha(Opacity=0)";
  filter: progid: DXImageTransform.Microsoft.Alpha(Opacity=0);
  opacity: 0;
  pointer-events: none;
}
[TIP]::before {
  position: absolute;
  bottom: 0%;
  left: 100%;
  margin-bottom: 5px;
  padding: 7px;
  -webkit-border-radius: 3px;
  -moz-border-radius: 3px;
  border-radius: 3px;
  background-color: #000;
  background-color: hsla(0, 0%, 20%, 0.9);
  color: #fff;
  content: attr(TIP);
  text-align: center;
  font-size: 14px;
  line-height: 1.2;
}
[TIP]:hover::before,
[TIP]:hover::after {
  visibility: visible;
  -ms-filter: "progid:DXImageTransform.Microsoft.Alpha(Opacity=100)";
  filter: progid: DXImageTransform.Microsoft.Alpha(Opacity=100);
  opacity: 1;
}

div#ia {
  position: fixed;
  top: 0;
  right: 0;
  width: 20%;
  height: 100%;
  padding: 10px;
  overflow: auto;
  background-color: lightgray;
}

div#content {
  width: 79%;
  height: 100%;
  padding: 10px;
  overflow: auto;
}









-----Original Message-----
From: Gandhi Rajan Natarajan [mailto:[email protected]]
Sent: Wednesday, September 27, 2017 4:40 AM
To: [email protected]
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,

Sorry for the delayed response as I was out of office due to illness. If I 
don't add BackwardsTimeAnnotator, I don't see any error related to isTraining 
param. But still couldn't get the superscript co-reference working. Please note 
that I am using the latest 4.0.1 jars. The piper file and console log messages 
are as follows:

PIPER FILE:
// Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), 
Paragraphs,Lists load AdvancedTokenizerPipeline.piper add 
ContextDependentTokenizerAnnotator
add POSTagger
// Chunkers
load ChunkerSubPipe.piper
// Default fast dictionary lookup
load DictionarySubPipe.piper
add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
// Cleartk Entity Attributes
load AttributeCleartkSubPipe.piper
// Relations
load RelationSubPipe.piper
// Temporal
load TemporalSubPipe.piper
// Coreferences
load CorefSubPipe.piper
//add org.apache.ctakes.temporal.ae.BackwardsTimeAnnotator
// Html output
add pretty.html.HtmlTextWriter
// XMl writer
add FileTreeXmiWriter

CONSOLE LOG:

22 Sep 2017 13:59:44  INFO ClearNLPSemanticRoleLabelerAE - Finished initializing
22 Sep 2017 13:59:44  INFO CleartkAnalysisEngine - Starting initializing for 
Assigning Attributes
22 Sep 2017 13:59:46  INFO CleartkAnalysisEngine - Finished initializing
22 Sep 2017 13:59:46  INFO ModifierExtractorAnnotator - Starting initializing
22 Sep 2017 13:59:46  INFO ModifierExtractorAnnotator - Finished initializing
22 Sep 2017 13:59:46  INFO DegreeOfRelationExtractorAnnotator - Starting 
initializing
22 Sep 2017 13:59:46  INFO DegreeOfRelationExtractorAnnotator - Finished 
initializing
22 Sep 2017 13:59:46  INFO LocationOfRelationExtractorAnnotator - Starting 
initializing
22 Sep 2017 13:59:46  INFO LocationOfRelationExtractorAnnotator - Finished 
initializing
22 Sep 2017 13:59:46  INFO BackwardsTimeAnnotator - Starting initializing
22 Sep 2017 13:59:46  INFO BackwardsTimeAnnotator - Finished initializing
22 Sep 2017 13:59:46  INFO DocTimeRelAnnotator - Starting initializing
22 Sep 2017 13:59:48  INFO DocTimeRelAnnotator - Finished initializing
22 Sep 2017 13:59:48  INFO EventTimeRelationAnnotator - Starting initializing
22 Sep 2017 13:59:49  INFO EventTimeRelationAnnotator - Finished initializing
22 Sep 2017 13:59:49  INFO EventEventRelationAnnotator - Starting initializing
22 Sep 2017 13:59:51  INFO EventEventRelationAnnotator - Finished initializing
22 Sep 2017 13:59:51  INFO ConstituencyParser - Initializing parser...
22 Sep 2017 13:59:54  INFO RegexSectionizer - Annotating Sections ...
22 Sep 2017 13:59:55  INFO RegexSectionizer - Finished processing
22 Sep 2017 13:59:55  INFO SentenceDetectorAnnotatorBIO - Starting processing 
...
22 Sep 2017 13:59:55  INFO SentenceDetectorAnnotatorBIO - Finished processing
22 Sep 2017 13:59:55  INFO ParagraphAnnotator - Annotating Paragraphs ...
22 Sep 2017 13:59:55  INFO ParagraphAnnotator - Finished processing
22 Sep 2017 13:59:55  INFO ParagraphSentenceFixer - Adjusting Sentences 
overlapping Paragraphs ...
22 Sep 2017 13:59:55  INFO ParagraphSentenceFixer - Finished Processing
22 Sep 2017 13:59:55  INFO ListAnnotator - Annotating Lists ...
22 Sep 2017 13:59:55  INFO ListAnnotator - Finished processing
22 Sep 2017 13:59:55  INFO ListSentenceFixer - Adjusting Sentences overlapping 
Lists ...
22 Sep 2017 13:59:55  INFO ListSentenceFixer - Finished Processing
22 Sep 2017 13:59:55  INFO TokenizerAnnotatorPTB - process(JCas) in 
org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
22 Sep 2017 13:59:55  INFO ContextDependentTokenizerAnnotator - process(JCas)
22 Sep 2017 13:59:55  INFO POSTagger - process(JCas)
22 Sep 2017 13:59:55  INFO Chunker -  process(JCas)
22 Sep 2017 13:59:55  INFO ChunkAdjuster -  process(JCas)
22 Sep 2017 13:59:55  INFO ChunkAdjuster -  process(JCas)
22 Sep 2017 13:59:55  INFO AbstractJCasTermAnnotator - Finding Named Entities 
...
22 Sep 2017 13:59:55  INFO AbstractJCasTermAnnotator - Finished processing
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - process dev (JCas)
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:55  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:56  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:56  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:56  INFO DrugMentionAnnotator - -1
22 Sep 2017 13:59:56  INFO ClearNLPDependencyParserAE - Dependency parser 
starting with thread:pool-2-thread-1
22 Sep 2017 13:59:56  INFO ClearNLPDependencyParserAE - Dependency parser 
ending with thread:pool-2-thread-1
22 Sep 2017 13:59:56  INFO ClearNLPSemanticRoleLabelerAE - Starting processing 
...
22 Sep 2017 13:59:56  INFO ClearNLPSemanticRoleLabelerAE - Finished processing
22 Sep 2017 13:59:56  INFO CleartkAnalysisEngine - Assigning Attributes ...
22 Sep 2017 13:59:56  INFO CleartkAnalysisEngine - Finished Assigning Attributes
22 Sep 2017 13:59:56  INFO ModifierExtractorAnnotator - Starting processing ...
22 Sep 2017 13:59:56  INFO ModifierExtractorAnnotator - Finished processing
22 Sep 2017 13:59:56  INFO DegreeOfRelationExtractorAnnotator - Starting 
processing ...
22 Sep 2017 13:59:56  INFO DegreeOfRelationExtractorAnnotator - Finished 
processing
22 Sep 2017 13:59:56  INFO LocationOfRelationExtractorAnnotator - Starting 
processing ...
22 Sep 2017 13:59:57  INFO LocationOfRelationExtractorAnnotator - Finished 
processing
22 Sep 2017 13:59:57  INFO BackwardsTimeAnnotator - Starting processing ...
22 Sep 2017 13:59:57  INFO BackwardsTimeAnnotator - Finished processing
22 Sep 2017 13:59:57  INFO DocTimeRelAnnotator - Starting processing ...
22 Sep 2017 13:59:58  INFO DocTimeRelAnnotator - Finished processing
22 Sep 2017 13:59:58  INFO EventTimeRelationAnnotator - Starting processing ...
22 Sep 2017 13:59:59  INFO EventTimeRelationAnnotator - Finished processing
22 Sep 2017 13:59:59  INFO EventEventRelationAnnotator - Starting processing ...
22 Sep 2017 13:59:59  INFO EventEventRelationAnnotator - Finished processing
22 Sep 2017 13:59:59  INFO MaxentParserWrapper - Started processing: test
22 Sep 2017 14:00:02  INFO MaxentParserWrapper - Done parsing: test
22 Sep 2017 14:00:03  INFO MentionClusterCoreferenceAnnotator - Finding 
Coreferences ...
22 Sep 2017 14:00:03  INFO MentionClusterCoreferenceAnnotator - Finished.
22 Sep 2017 14:00:03  INFO HtmlTextWriter - Writing HTML to 
D:\Gandhi\ArisG\cTAKES\apache-ctakes-4.0.0\bin_old\test_output\test.txt.pretty.html
 ...
22 Sep 2017 14:00:03  INFO HtmlTextWriter - Finished Writing
22 Sep 2017 14:00:03  INFO FileTreeXmiWriter - Writing XMI to 
D:\Gandhi\ArisG\cTAKES\apache-ctakes-4.0.0\bin_old\test_output\test.txt.xmi ...
Sep 22, 2017 2:00:03 PM org.apache.uima.util.MessageReport 
decreasingWithTrace(51)
WARNING: Message count: 1; Feature 
org.apache.ctakes.typesystem.type.textsem.Predicate:relations is marked 
multipleReferencesAllowed=false, but it has multiple references.  These will be 
serialized in duplicate. Message count indicates messages skipped to avoid 
potential flooding. Set FINE logging level for stacktrace.
Sep 22, 2017 2:00:03 PM org.apache.uima.util.MessageReport 
decreasingWithTrace(51)
WARNING: Message count: 2; Feature 
org.apache.ctakes.typesystem.type.textsem.Predicate:relations is marked 
multipleReferencesAllowed=false, but it has multiple references.  These will be 
serialized in duplicate. Message count indicates messages skipped to avoid 
potential flooding. Set FINE logging level for stacktrace.
Sep 22, 2017 2:00:03 PM org.apache.uima.util.MessageReport 
decreasingWithTrace(51)
WARNING: Message count: 4; Feature 
org.apache.ctakes.typesystem.type.textsem.Predicate:relations is marked 
multipleReferencesAllowed=false, but it has multiple references.  These will be 
serialized in duplicate. Message count indicates messages skipped to avoid 
potential flooding. Set FINE logging level for stacktrace.
Sep 22, 2017 2:00:03 PM org.apache.uima.util.MessageReport 
decreasingWithTrace(51)
WARNING: Message count: 8; Feature 
org.apache.ctakes.typesystem.type.textsem.Predicate:relations is marked 
multipleReferencesAllowed=false, but it has multiple references.  These will be 
serialized in duplicate. Message count indicates messages skipped to avoid 
potential flooding. Set FINE logging level for stacktrace.
Sep 22, 2017 2:00:03 PM org.apache.uima.util.MessageReport 
decreasingWithTrace(51)
WARNING: Message count: 16; Feature 
org.apache.ctakes.typesystem.type.textsem.Predicate:relations is marked 
multipleReferencesAllowed=false, but it has multiple references.  These will be 
serialized in duplicate. Message count indicates messages skipped to avoid 
potential flooding. Set FINE logging level for stacktrace.
Sep 22, 2017 2:00:03 PM org.apache.uima.util.MessageReport 
decreasingWithTrace(51)
WARNING: Message count: 32; Feature 
org.apache.ctakes.typesystem.type.textsem.Predicate:relations is marked 
multipleReferencesAllowed=false, but it has multiple references.  These will be 
serialized in duplicate. Message count indicates messages skipped to avoid 
potential flooding. Set FINE logging level for stacktrace.
22 Sep 2017 14:00:03  INFO FileTreeXmiWriter - Finished Writing


Sean,  we have also made some code changes in MeasurementFSM.java to identify 
certain measurements like '20 mg/m2' which was not identified out of the box.  
Should we send the code changes to you so that you can consider the same to be 
productized ? Please advise.

Regards,
Gandhi


-----Original Message-----
From: Finan, Sean [mailto:[email protected]]
Sent: Friday, September 22, 2017 6:54 PM
To: [email protected]
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

You don't need to add BackwardsTimeAnnotator to your piper.  It is added by the 
TemporalSubPipe.piper.  The  error that you are seeing regarding training is 
very strange, but you can try adding this line to the top of the file:
set isTraining=false

Can you run a sample file with your piper and send me the log statements?  It 
might help me figure out what is going on.

> is there any doc or guide on how to start writing our own annotator.
There are two example annotators in the ctakes-examples project under the ae/ 
directory.  You can look at those, but I recommend that you look at some 
information on Uimafit, which can be used to create new annotators:
https://urldefense.proofpoint.com/v2/url?u=https-3A__uima.apache.org_d_uimafit-2D2.1.0_tools.uimafit.book.pdf&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=OlZ5SUTgU94HjHE8vZDkXv8hjaaa9qEpAlfZjU52Ymk&s=0rIPMY5osSxL4J9gMymmv0bHsBXimd0yb1FmUp4uT-A&e=
An introduction to creating Analysis Engines (Annotators) is on page 5.

Coding style is individualistic, but below is a rubberstamp that I use to get 
started:

import org.apache.ctakes.core.pipeline.PipeBitInfo;
import org.apache.log4j.Logger;
import org.apache.uima.UimaContext;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.fit.component.JCasAnnotator_ImplBase;
import org.apache.uima.jcas.JCas;
import org.apache.uima.resource.ResourceInitializationException;

/**
 * @author SPF , chip-nlp
 * @version %I%
 * @since 9/22/2017
 */
@PipeBitInfo(
      name = "Template",
      description = "For Example.", role = PipeBitInfo.Role.ANNOTATOR
)
final public class Template extends JCasAnnotator_ImplBase {

   static private final Logger LOGGER = Logger.getLogger( "Template" );

   /**
    * {@inheritDoc}
    */
   @Override
   public void initialize( final UimaContext context ) throws 
ResourceInitializationException {
      // Always call the super first
      super.initialize( context );
      // place AE initialization code here
   }

   /**
    * {@inheritDoc}
    */
   @Override
   public void process( final JCas jCas ) throws AnalysisEngineProcessException 
{
      LOGGER.info( "Processing ..." );
      // Place AE processing code here
      LOGGER.info( "Finished." );
   }
}



If you use IntelliJ as your ide you can create a file template with these 
parameters:

#if (${PACKAGE_NAME} && ${PACKAGE_NAME} != "")package ${PACKAGE_NAME};#end

import org.apache.ctakes.core.pipeline.PipeBitInfo;
import org.apache.log4j.Logger;
import org.apache.uima.UimaContext;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.fit.component.JCasAnnotator_ImplBase;
import org.apache.uima.jcas.JCas;
import org.apache.uima.resource.ResourceInitializationException;

#parse("File Header.java")
@PipeBitInfo(
      name = "${NAME}",
      #if ( ${PROJECT_NAME} != "")description = "For ${PROJECT_NAME}.",#end
      role = PipeBitInfo.Role.ANNOTATOR
)
final public class ${NAME} extends JCasAnnotator_ImplBase {

   static private final Logger LOGGER = Logger.getLogger( "${NAME}" );

   /**
    * {@inheritDoc}
    */
   @Override
   public void initialize( final UimaContext context ) throws 
ResourceInitializationException {
      // Always call the super first
      super.initialize( context );
      // place AE initialization code here
   }

   /**
    * {@inheritDoc}
    */
   @Override
   public void process( final JCas jCas ) throws AnalysisEngineProcessException 
{
      LOGGER.info( "Processing ..." );
      // Place AE processing code here
      LOGGER.info( "Finished." );
   }
}





-----Original Message-----
From: Gandhi Rajan Natarajan [mailto:[email protected]]
Sent: Friday, September 22, 2017 2:23 AM
To: [email protected]
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,

Thanks again for the detailed response.

I still couldn't manage to get superscript-1 co-reference in piper GUI.  Also 
I'm not able to use "BackwardsTimeAnnotator" in piper GUI as it gives me the 
below error:

org.apache.uima.resource.ResourceInitializationException: Initialization of 
annotator class "org.apache.ctakes.temporal.ae.BackwardsTimeAnnotator" failed.  
(Descriptor: <unknown>)
        at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
        at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:170)
Caused by: java.lang.IllegalArgumentException: Please specify PARAM_IS_TRAINING 
- unable to infer it from context
        at org.cleartk.ml.CleartkAnnotator.initialize(CleartkAnnotator.java:109)

Somewhere in old mails it's mentioned that it's because of missing dependencies 
so I tried adding ClearTkAnnotator with no luck yet. My piper file is as 
follows:

load AdvancedTokenizerPipeline.piper
add ContextDependentTokenizerAnnotator
add POSTagger
load ChunkerSubPipe.piper
load DictionarySubPipe.piper
add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
load AttributeCleartkSubPipe.piper
load RelationSubPipe.piper
load TemporalSubPipe.piper
load CorefSubPipe.piper
add org.apache.ctakes.temporal.ae.BackwardsTimeAnnotator
add pretty.html.HtmlTextWriter
add FileTreeXmiWriter

Any suggestion on this? Also I'm using all the latest 4.0.1 cTAKES Jars. 
Regarding the identification of Names, will dig deep on what you have mentioned.

Sorry to ask this as you already mentioned that there are no detailed docs for 
cTAKES. But is there any doc or guide on how to start writing our own annotator 
if required? It not, Is there any simple annotator that you would suggest us to 
look into to get better understanding on annotators for us to proceed further.  
Thanks in advance.

Regards,
Gandhi


-----Original Message-----
From: Finan, Sean [mailto:[email protected]]
Sent: Thursday, September 21, 2017 7:59 AM
To: [email protected]
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

> We guess we are missing out on something as we could not find co-references 
> for "200mg". Should we add anymore piper for this?
The piper commands that I sent has everything to obtain coreferences.  I use it 
regularly - it is what I used on your example sentence to get the coreferences 
that I mentioned.

> Also the change mentioned in the thread ...
That is a very old thread and I don't think that it applies to what you are 
trying to do.

> We also have a requirement to identify the patient names and sex
As James said, ctakes isn't really meant to do this.  Ctakes is catered toward 
extracting clinical data, and to this point names have not fallen into that 
category.  It is more a task for general nlp.  There is an opennlp model that 
can identify names and a few others (I used to see names using GATE).  ctakes 
has wrapped opennlp for other tasks and you should be able to do the same to 
adapt an engine for names into ctakes.

> cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06
> / 07 / 02 or 27Aug2002
As Chen mentioned, the BackwardTimeAnnotator module uses an ML model trained on 
gold data.  It isn't perfect.  You can add another time annotator on top of 
this to get some of the more simply formatted date mentions - there are a lot 
of them out there.  Personally I have used jchronic as it can be easily tweaked 
to recognize medically-relevant temporal expressions relating to surgery, 
pharmacology, etc.

Sean


-----Original Message-----
From: Finan, Sean [mailto:[email protected]]
Sent: Wednesday, September 20, 2017 8:50 AM
To: [email protected]
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

I don't have time to go through all of this right now, but I will try to get to 
it soon.

Make sure that you are running the latest version in trunk.

Sean

-----Original Message-----
From: Gandhi Rajan Natarajan [mailto:[email protected]]
Sent: Wednesday, September 20, 2017 7:03 AM
To: [email protected]
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi, Could someone help me out on the below queries please?

Regards,
Gandhi

-----Original Message-----
From: Gandhi Rajan Natarajan [mailto:[email protected]]
Sent: Tuesday, September 19, 2017 8:51 PM
To: [email protected]
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Sean,

Thanks again for the detailed and prompt response. We were able to run the 
piper GUI as per your advice. But in the output (The patient started study 
treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 
1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular 
carcinoma.), we were not able to find superscript-1 as you mentioned earlier 
but could find superscript-2, 3 etc.  We guess we are missing out on something 
as we could not find co-references for "200mg". Should we add anymore piper for 
this?

Also the change mentioned in the thread - 
https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHGWe4hSNHQg-40mail.gmail.com-253E&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0&s=GzhvIkBu4cgyzYN9n6VLe2rz4sJhJzMxDcWyB0BkqAc&e=
  is required for the drug-ner module to identify drug-ner annotations.

1) We also have a requirement to identify the patient names and sex available 
in narrative texts. Please let us know how to achieve the same as its not 
identifying the proper nouns and the relationship with the patient?
Eg. "This male patient named Tom Hardy aged 35 years is participating in a 
Non-IND study"

2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 
/ 02 or 27Aug2002 as in the below example. Please let us know how to enhance 
the system to identify such date patterns.
E.g " On 20Aug02, the investigator noted that this patient was suffering 
worsening fatigue and got tired getting out of his chair"

Regards,
Gandhi


-----Original Message-----
From: Finan, Sean [mailto:[email protected]]
Sent: Monday, September 18, 2017 10:02 PM
To: [email protected]
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Gandhi,

> So in this case will be able to see drug attributes in the output XML?
As long as you have the DrugMentionAnnotator in your pipeline you should be 
able to find drug attributes in the xml output file.

> we also saw some code changes needs to be done to use drug-ner module. Is it 
> still valid?
As far as I know there aren't any necessary code changes to get drug ner 
running.  However, I do not normally use drugner so I can't say for certain.

> Also you mentioned that the drun-ner module is out of date
It can still be used and will produce annotations.  All that I meant was that 
there may not be many people out there using it.  It is not part of the default 
pipeline.

  > You also mentioned that when you run the sentence, the date was identified. 
Where and how exactly did you ran it so that we can check the same?
I run the following in a piper file because I am interested in a lot of modules 
(I added drugner just for you):

// Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), 
Paragraphs, Lists load AdvancedTokenizerPipeline.piper add 
ContextDependentTokenizerAnnotator
add POSTagger
// Chunkers
load ChunkerSubPipe.piper
// Default fast dictionary lookup
load DictionarySubPipe.piper
add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
// Cleartk Entity Attributes
load AttributeCleartkSubPipe.piper
// Relations
load RelationSubPipe.piper
// Temporal
load TemporalSubPipe.piper
// Coreferences
load CorefSubPipe.piper
// Html output
add pretty.html.HtmlTextWriter

For information on piper files, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0&s=9ueuHYwEywok8byBXEkVjmTWiChmaIY3ryB4Pi6ajRo&e=
I run it in my IDE with:
org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G -p <FileAsAbove>.piper 
-i org/apache/ctakes/examples/notes -o <OutputDir> --user <MyUmlsUser> --pass 
<MyUmlsPass> You can run it by command line by substituting 
"org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G" with 
"bin/runPiperFile".
You can also run it through a ctakes 4.01 (trunk) gui.  See 
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0&s=VWIrXrfA2dZ8KHOdoizJo-nTx7nPSy4GDOZ7IxQteIQ&e=

> I'm not able to see any clickable option in HTML output
You must have the HtmlTextWriter at the end of your pipeline to produce html 
files.  To keep the xml file output, place "add FileTreeXmiWriter" at the end 
of the piper.

> Apologizes for too many
No worries, we are happy to have your interest!

Sean


-----Original Message-----
From: Gandhi Rajan Natarajan [mailto:[email protected]]
Sent: Saturday, September 16, 2017 7:01 AM
To: [email protected]
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Sean,

Thanks again for the prompt response. Appreciate your input on adding 
DrugMentionAnnotator. Actually, we are relying on pretty printer output just to 
understand the analysis. Our logic to extract disorders and findings are based 
on the XML file generated by 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_healthnlp_examples_blob_master_ctakes-2Dtemporal-2Ddemo_src_main_java_org_apache_ctakes_web_client_servlet_DemoServlet.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=_MJKBj93YJdd5aa84dBvqtg6o-BKBn7UcbfF660CEBI&s=g8UzBHRoOyn1hoRABKSC6EtPMvwOSSggviRmWCHKti4&e=
   So in this case will be able to see drug attributes in the output XML?

In one of the old post 
(https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHGWe4hSNHQg-40mail.gmail.com-253E&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=_MJKBj93YJdd5aa84dBvqtg6o-BKBn7UcbfF660CEBI&s=iT_1UGR98APO80UaZsaCBHseMqF4M4PfItgokD27r5c&e=
  ) we also saw some code changes needs to be done to use drug-ner module. Is 
it still valid? Also you mentioned that the drun-ner module is out of date 
which means it cannot be used or it may not provide accurate analysis? Also 
what changes needs to be done to bring it up to date so that we can try the 
same if you can assist?

You also mentioned that when you run the sentence, the date was identified. 
Where and how exactly did you ran it so that we can check the same? Also 
regarding you explanation on corefernce, I'm not able to see any clickable 
option in HTML output. So wanted to understand how can we run and check that 
too.

Apologizes for too many questions as we are just a week old in NLP and cTAKES. 
Thanks in advance.

Regards,
Gandhi

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

Reply via email to