Maybe creating a made-up set of sentences would be an option? That way we could
agree on the annotation of concrete cases. Although this would be more of a
unit test than a corpus.
Dima
On Sep 27, 2014, at 12:15, Miller, Timothy
wrote:
> I've just been using the opennlp command line cross
Why not use the i2b2 corpora?
On Monday, September 29, 2014, Dligach, Dmitriy <
dmitriy.dlig...@childrens.harvard.edu> wrote:
> Maybe creating a made-up set of sentences would be an option? That way we
> could agree on the annotation of concrete cases. Although this would be
> more of a unit test
Some of them are a bit artificial for this task, with notes being
annotated as one sentence per line and offset punctuation. I think maybe
the 2008 and 2009 data might have original formatting though, with
newlines not always breaking sentences. That has certain advantages over
raw MIMIC for traini
How about pairing it with THYME and MiPACQ? Perhaps you are using them
already...
--Guergana
-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
Sent: Monday, September 29, 2014 1:38 PM
To: dev@ctakes.apache.org
Subject: Re: sentence detector model
Som
I have a set of about 27K documents from MIMIC (circa 2009) in which I have
replaced the weird PHI markers by synthesized pseudonymous data. These have
natural sentence breaks (typically in the middle of lines), normal paragraph
structure, bulleted lists, etc. Assuming it goes to people who ha
Hello All,
I am working on a use case for lab tests data using cTAKES and my online
search to find a test dataset has been futile. I'll greatly appreciate if
someone can share such a dataset or can point me in the right direction to
go looking for one.
Best,
Ajay
--
Founder & CEO
Mobile Insigh
Ajay, I'm confused by your query. cTakes is good at interpreting text, but
most lab test results are reported in tabular form that is most appropriately
searched by SQL queries. Sometimes lab results are also reported in narrative
notes, but parsing those is often more a matter of deciphering
That sounds like it would be perfect for this task
On Monday, September 29, 2014, Peter Szolovits wrote:
> I have a set of about 27K documents from MIMIC (circa 2009) in which I
> have replaced the weird PHI markers by synthesized pseudonymous data.
> These have natural sentence breaks (typicall
That does sound like it would be useful since MIMIC does have both kinds
of linebreak styles in different notes. If I did some annotations on
such a dataset would it be re-distributable, say on the physionet
website? I believe the ShARe project has a download site there (it is a
layer of annotation
Ajay,
cTAKES currently does not implement a method to discover labs from the text.
The motivation is that you can get that easily from the structured part of the
EMR (what Pete explained below). Hope this makes sense!
--Guergana
-Original Message-
From: Peter Szolovits [mailto:p...@mit.e
Assuming we have a representative training set, are there any objections if we
default cTAKES to this SentenceAnnotator + Model?
For the upcoming release:
- Consolidate the existing sentence detector, ytex sentence dectector into this
new?
- Allow a config parameter to still allow an override of
How about this idea the training/test set:
1) Start with a document with NO newlines. Perhaps just the entire document is
a single paragraph.
2) Then, any sentence detector should be able to parse it correctly.
3) Then, deterministically add newlines to the document: some after
punctuation;
Sorry, I wasn't clear. I am working on a related project and trying to figure
out if the code can be repurposed for a lab mention annotator for cTAKES. From
what I have seen, test names from different institutions are not standardized
which makes it hard to standardize the resulting annotation.
13 matches
Mail list logo