So perhaps we could re-train it to disambiguate newline characters as well?
Steve
On May 21, 2013, at 11:33 AM, "Savova, Guergana"
wrote:
> The model is trained to disambiguate punctuation characters which in most
> cases is the period.
> --Guergana
>
> -Original Message-
> From: Ste
The model is trained to disambiguate punctuation characters which in most cases
is the period.
--Guergana
-Original Message-
From: Steven Bethard [mailto:steven.beth...@colorado.edu]
Sent: Tuesday, May 21, 2013 12:07 PM
To: dev@ctakes.apache.org
Subject: Re: sentence detector newline beh
I presume the combination turned out to perform the best in the past...? (based
on James and Guergana's enum/medication examples)
Having a flag to turn off the hard newline rule seems reasonable if it works.
My 1/2 cent...
(short of having to preprocess MIMIC Radiology formatted notes or retrainin
Thanks Steve!
I see new/current repository location. Will go through that.
Thank you,
Giri
On May 21, 2013, at 9:53 AM, "Savova, Guergana"
wrote:
> The OpenNLP sentence segmenter is trained on clinical data (cannot remember
> exactly how many sentences were in the training corpus). This is the model
> distributed with cTAKES. The only hard rule is the new line.
If it's trained on cl
The OpenNLP sentence segmenter is trained on clinical data (cannot remember
exactly how many sentences were in the training corpus). This is the model
distributed with cTAKES. The only hard rule is the new line.
--Guergana
-Original Message-
From: Steven Bethard [mailto:steven.beth...@co
On May 21, 2013, at 9:02 AM, Tim Miller
wrote:
> I think the whole reason to use a machine learning approach for sentence
> detection should be to help weigh evidence with these cases where hard
> rules cause problems, mainly 1) when a period does not end a sentence,
> but also 2) where a newl
I think the whole reason to use a machine learning approach for sentence
detection should be to help weigh evidence with these cases where hard
rules cause problems, mainly 1) when a period does not end a sentence,
but also 2) where a newline does and does not mean end of sentence. It
is of cou
+1 for adding a boolean parameter, or perhaps instead a list of section IDs
The sentence detector model was trained on data that always breaks at carriage
returns.
It is important for text that is a list something like this:
Heart Rate: normal
ENT: negative
EXTRAVASCULAR FINDINGS: Severe pros
In the clinical narrative there are many sections that are enumerations and
where a new line character must be treated as a sentence break. For example,
Current Medications in which each line contains a medication and its signature.
The format of the MIMIC notes is a bit strange as there are man
On May 21, 2013, at 6:07 AM, "Miller, Timothy"
wrote:
> The sentence detector always ends a sentence where there are newlines.
> This is a problem for some notes (e.g. MIMIC radiology notes) where a
> line can wrap in the middle of a sentence at specified character
> offsets. In the comments for
On May 20, 2013, at 9:34 PM, giri vara prasad nambari
wrote:
> Here is the link where I found these files:
> https://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.0.0-incubating/ctakes-relation-extractor/resources/models/modifier_extractor/
I see. You're not working from the current repository,
The sentence detector always ends a sentence where there are newlines.
This is a problem for some notes (e.g. MIMIC radiology notes) where a
line can wrap in the middle of a sentence at specified character
offsets. In the comments for SentenceDetector, it seems to be split up
very logically in tha
13 matches
Mail list logo