>
> --Pei
>
>
>> -Original Message-
>> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
>> Sent: Monday, September 29, 2014 2:47 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: sentence detector model
>>
>> That do
, 2014 2:47 PM
> To: dev@ctakes.apache.org
> Subject: Re: sentence detector model
>
> That does sound like it would be useful since MIMIC does have both kinds of
> linebreak styles in different notes. If I did some annotations on such a
> dataset would it be re-distributable, sa
That does sound like it would be useful since MIMIC does have both kinds
of linebreak styles in different notes. If I did some annotations on
such a dataset would it be re-distributable, say on the physionet
website? I believe the ShARe project has a download site there (it is a
layer of annotation
That sounds like it would be perfect for this task
On Monday, September 29, 2014, Peter Szolovits wrote:
> I have a set of about 27K documents from MIMIC (circa 2009) in which I
> have replaced the weird PHI markers by synthesized pseudonymous data.
> These have natural sentence breaks (typicall
I have a set of about 27K documents from MIMIC (circa 2009) in which I have
replaced the weird PHI markers by synthesized pseudonymous data. These have
natural sentence breaks (typically in the middle of lines), normal paragraph
structure, bulleted lists, etc. Assuming it goes to people who ha
How about pairing it with THYME and MiPACQ? Perhaps you are using them
already...
--Guergana
-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
Sent: Monday, September 29, 2014 1:38 PM
To: dev@ctakes.apache.org
Subject: Re: sentence detector model
Some of them are a bit artificial for this task, with notes being
annotated as one sentence per line and offset punctuation. I think maybe
the 2008 and 2009 data might have original formatting though, with
newlines not always breaking sentences. That has certain advantages over
raw MIMIC for traini
Why not use the i2b2 corpora?
On Monday, September 29, 2014, Dligach, Dmitriy <
dmitriy.dlig...@childrens.harvard.edu> wrote:
> Maybe creating a made-up set of sentences would be an option? That way we
> could agree on the annotation of concrete cases. Although this would be
> more of a unit test
Maybe creating a made-up set of sentences would be an option? That way we could
agree on the annotation of concrete cases. Although this would be more of a
unit test than a corpus.
Dima
On Sep 27, 2014, at 12:15, Miller, Timothy
wrote:
> I've just been using the opennlp command line cross
I've just been using the opennlp command line cross validator on the small
dataset i annotated (along with some eyeballing). It would be cool if there was
a standard clinical resource available for this task, but I hadn't considered
it much because the data I annotated pulls from multiple datase
Tim, thanks for working on this!
Question: do we have some formal way of evaluating the sentence detector? Maybe
we should come up with some dev set that would include examples from mimic...
Dima
On Sep 27, 2014, at 8:57, Miller, Timothy
wrote:
> I have been working on the sentence detect
11 matches
Mail list logo