Re: sentence detector newline behavior

2013-05-21 Thread Steven Bethard
So perhaps we could re-train it to disambiguate newline characters as well? Steve On May 21, 2013, at 11:33 AM, "Savova, Guergana" wrote: > The model is trained to disambiguate punctuation characters which in most > cases is the period. > --Guergana > > -Original Message- > From: Ste

RE: sentence detector newline behavior

2013-05-21 Thread Savova, Guergana
The model is trained to disambiguate punctuation characters which in most cases is the period. --Guergana -Original Message- From: Steven Bethard [mailto:steven.beth...@colorado.edu] Sent: Tuesday, May 21, 2013 12:07 PM To: dev@ctakes.apache.org Subject: Re: sentence detector newline beh

RE: sentence detector newline behavior

2013-05-21 Thread Chen, Pei
I presume the combination turned out to perform the best in the past...? (based on James and Guergana's enum/medication examples) Having a flag to turn off the hard newline rule seems reasonable if it works. My 1/2 cent... (short of having to preprocess MIMIC Radiology formatted notes or retrainin

Re: training-data.libsvm Vs model.libsvm (RelationExtractor)

2013-05-21 Thread giri vara prasad nambari
Thanks Steve! I see new/current repository location. Will go through that. Thank you, Giri

Re: sentence detector newline behavior

2013-05-21 Thread Steven Bethard
On May 21, 2013, at 9:53 AM, "Savova, Guergana" wrote: > The OpenNLP sentence segmenter is trained on clinical data (cannot remember > exactly how many sentences were in the training corpus). This is the model > distributed with cTAKES. The only hard rule is the new line. If it's trained on cl

RE: sentence detector newline behavior

2013-05-21 Thread Savova, Guergana
The OpenNLP sentence segmenter is trained on clinical data (cannot remember exactly how many sentences were in the training corpus). This is the model distributed with cTAKES. The only hard rule is the new line. --Guergana -Original Message- From: Steven Bethard [mailto:steven.beth...@co

Re: sentence detector newline behavior

2013-05-21 Thread Steven Bethard
On May 21, 2013, at 9:02 AM, Tim Miller wrote: > I think the whole reason to use a machine learning approach for sentence > detection should be to help weigh evidence with these cases where hard > rules cause problems, mainly 1) when a period does not end a sentence, > but also 2) where a newl

Re: sentence detector newline behavior

2013-05-21 Thread Tim Miller
I think the whole reason to use a machine learning approach for sentence detection should be to help weigh evidence with these cases where hard rules cause problems, mainly 1) when a period does not end a sentence, but also 2) where a newline does and does not mean end of sentence. It is of cou

RE: sentence detector newline behavior

2013-05-21 Thread Masanz, James J.
+1 for adding a boolean parameter, or perhaps instead a list of section IDs The sentence detector model was trained on data that always breaks at carriage returns. It is important for text that is a list something like this: Heart Rate: normal ENT: negative EXTRAVASCULAR FINDINGS: Severe pros

RE: sentence detector newline behavior

2013-05-21 Thread Savova, Guergana
In the clinical narrative there are many sections that are enumerations and where a new line character must be treated as a sentence break. For example, Current Medications in which each line contains a medication and its signature. The format of the MIMIC notes is a bit strange as there are man

Re: sentence detector newline behavior

2013-05-21 Thread Steven Bethard
On May 21, 2013, at 6:07 AM, "Miller, Timothy" wrote: > The sentence detector always ends a sentence where there are newlines. > This is a problem for some notes (e.g. MIMIC radiology notes) where a > line can wrap in the middle of a sentence at specified character > offsets. In the comments for

Re: training-data.libsvm Vs model.libsvm (RelationExtractor)

2013-05-21 Thread Steven Bethard
On May 20, 2013, at 9:34 PM, giri vara prasad nambari wrote: > Here is the link where I found these files: > https://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.0.0-incubating/ctakes-relation-extractor/resources/models/modifier_extractor/ I see. You're not working from the current repository,

sentence detector newline behavior

2013-05-21 Thread Miller, Timothy
The sentence detector always ends a sentence where there are newlines. This is a problem for some notes (e.g. MIMIC radiology notes) where a line can wrap in the middle of a sentence at specified character offsets. In the comments for SentenceDetector, it seems to be split up very logically in tha