Thank you Sean for the response. Sorry that the image are not visible for you 
and forgot to mention the version we are using which is version 4.0. 
Reiterating it as below

First image was how the Sentence Object looks like using CAS viewer
Second Image was the list of EndOfSentence Candidate like in the class 
‘EOSScannerImpl’as below
     private static final char [] eosCandidates={ ‘.’, ‘!’,’)’,’]’, ‘>’, 
‘/’’’,’:’, ‘;’};



So any modification to SentenceExtractor have impacts on every other downstream 
modules right? We will definitely have a look into the AE's you mentioned and  
you mean to say , that to try adding the AE's  EolSentenceFixer, 
MrsDrSentenceJoiner which would refine the sentence extraction right?.
Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


-----Original Message-----
From: Finan, Sean <sean.fi...@childrens.harvard.edu>
Sent: Thursday, June 11, 2020 8:20 PM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Re: Sentence detector changes [EXTERNAL]

[External]


Hi Abad,


None of your embedded images are visible to me, so I don't have whatever 
information is contained within those images.


It sounds like you are using the SentenceDetectorBIO.  Very cool.


It does have a few idiosyncrasies, one of which you have identified.


There are two helper AEs in ctakes-core that might be useful for you.  They are 
not in the released (4.0) version of ctakes, only in ctakes trunk.


EolSentenceFixer

Re-annotates Sentences based upon short lines, preventing a Sentence from 
spanning over an intentional line break.​

The BIO will often lump short (intentionally separated) lines into a single 
sentence.  This attempts to detect such intentionally short lines and split 
them.


MrsDrSentenceJoiner

Joins Sentences with person titles Mr. Mrs. Dr. that have been split by 
SentenceDetectorBIO.


You can peek at the code in MrsDrSentenceJoiner and do something similar to 
repair cases in which other texts like ')' have causes improper splits.


Because Sentence boundaries are often used in downstream processing (Mentions, 
Relations), it is very important that they be properly assigned.


Sean



________________________________
From: abad.ay...@cognizant.com <abad.ay...@cognizant.com>
Sent: Thursday, June 11, 2020 10:17 AM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Sentence detector changes [EXTERNAL]

* External Email - Caution *


Hi Team,

We are trying to utilize the maximum potential of cTAKES to meet the 
requirements for our profile, where we have a requirement to extract the 
sentences from the medical document. We have seen cTAKES already providing the 
list of sentences in the clinical text within the object as below

[cid:image002.png@01D64027.944FD390]


We also notice that sentences are delimited based on the below predefined 
delimiters, which was actually a problem in our requirement where sentences 
were seggregated whenever one of the below tokens are encountered.

[cid:image005.jpg@01D64029.1E6AC980]

For eg: “Patient was taking Paracetamol (650 mg) thrice daily” , was splitted 
to two different sentences(because a ‘)’ encountered)


1.     Patient was taking Paracetamol (650 mg)

2.     thrice daily


So we tried to customize it by removing some of the defined delimiters to meet 
our requirement. Actually we tried with just ‘.’ As delimiter and found 
sentences are splitted whenever a ‘.’ Is encountered Since this is a change 
done at the core module , we would like to know whether this is going to impact 
the clinical token identification process or going to have impact on the 
already provided informations like tlink,timex or any other critical attribute. 
Kindly advice.

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored. This e-mail and any files transmitted with it are for the sole 
use of the intended recipient(s) and may contain confidential and privileged 
information. If you are not the intended recipient(s), please reply to the 
sender and destroy all copies of the original message. Any unauthorized review, 
use, disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.

Reply via email to