Re: sentence detector model

2014-09-29 Thread Koola, Jejo David
> > --Pei > > >> -Original Message- >> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] >> Sent: Monday, September 29, 2014 2:47 PM >> To: dev@ctakes.apache.org >> Subject: Re: sentence detector model >> >> That do

RE: sentence detector model

2014-09-29 Thread Chen, Pei
, 2014 2:47 PM > To: dev@ctakes.apache.org > Subject: Re: sentence detector model > > That does sound like it would be useful since MIMIC does have both kinds of > linebreak styles in different notes. If I did some annotations on such a > dataset would it be re-distributable, sa

Re: sentence detector model

2014-09-29 Thread Miller, Timothy
That does sound like it would be useful since MIMIC does have both kinds of linebreak styles in different notes. If I did some annotations on such a dataset would it be re-distributable, say on the physionet website? I believe the ShARe project has a download site there (it is a layer of annotation

Re: sentence detector model

2014-09-29 Thread Karthik Sarma
That sounds like it would be perfect for this task On Monday, September 29, 2014, Peter Szolovits wrote: > I have a set of about 27K documents from MIMIC (circa 2009) in which I > have replaced the weird PHI markers by synthesized pseudonymous data. > These have natural sentence breaks (typicall

Re: sentence detector model

2014-09-29 Thread Peter Szolovits
I have a set of about 27K documents from MIMIC (circa 2009) in which I have replaced the weird PHI markers by synthesized pseudonymous data. These have natural sentence breaks (typically in the middle of lines), normal paragraph structure, bulleted lists, etc. Assuming it goes to people who ha

RE: sentence detector model

2014-09-29 Thread Savova, Guergana
How about pairing it with THYME and MiPACQ? Perhaps you are using them already... --Guergana -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, September 29, 2014 1:38 PM To: dev@ctakes.apache.org Subject: Re: sentence detector model

Re: sentence detector model

2014-09-29 Thread Miller, Timothy
Some of them are a bit artificial for this task, with notes being annotated as one sentence per line and offset punctuation. I think maybe the 2008 and 2009 data might have original formatting though, with newlines not always breaking sentences. That has certain advantages over raw MIMIC for traini

Re: sentence detector model

2014-09-29 Thread vijay garla
Why not use the i2b2 corpora? On Monday, September 29, 2014, Dligach, Dmitriy < dmitriy.dlig...@childrens.harvard.edu> wrote: > Maybe creating a made-up set of sentences would be an option? That way we > could agree on the annotation of concrete cases. Although this would be > more of a unit test

Re: sentence detector model

2014-09-29 Thread Dligach, Dmitriy
Maybe creating a made-up set of sentences would be an option? That way we could agree on the annotation of concrete cases. Although this would be more of a unit test than a corpus. Dima On Sep 27, 2014, at 12:15, Miller, Timothy wrote: > I've just been using the opennlp command line cross

Re: sentence detector model

2014-09-27 Thread Miller, Timothy
I've just been using the opennlp command line cross validator on the small dataset i annotated (along with some eyeballing). It would be cool if there was a standard clinical resource available for this task, but I hadn't considered it much because the data I annotated pulls from multiple datase

Re: sentence detector model

2014-09-27 Thread Dligach, Dmitriy
Tim, thanks for working on this! Question: do we have some formal way of evaluating the sentence detector? Maybe we should come up with some dev set that would include examples from mimic... Dima On Sep 27, 2014, at 8:57, Miller, Timothy wrote: > I have been working on the sentence detect