Re: De-identified lab tests dataset

2014-09-30 Thread John Green
How large? And across how many EMRs? 


JG
—
Sent from Mailbox

On Mon, Sep 29, 2014 at 6:58 PM, Ajay Jain 
wrote:

> Sorry, I wasn't clear. I am working on a related project and trying to figure 
> out if the code can be repurposed for a lab mention annotator for cTAKES. 
> From what I have seen, test names from different institutions are not 
> standardized which makes it hard to standardize the resulting annotation. 
> Getting access to a larger lab tests dataset (structured) will help me fine 
> tune the model. 
>  
> Hope this helps. 
> Ajay
> Sent from my iPhone
>> On Sep 29, 2014, at 2:12 PM, "Savova, Guergana" 
>>  wrote:
>> 
>> Ajay,
>> cTAKES currently does not implement a method to discover labs from the text. 
>> The motivation is that you can get that easily from the structured part of 
>> the EMR (what Pete explained below). Hope this makes sense!
>> --Guergana
>> 
>> -Original Message-
>> From: Peter Szolovits [mailto:p...@mit.edu] 
>> Sent: Monday, September 29, 2014 2:32 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: De-identified lab tests dataset
>> 
>> Ajay, I'm confused by your query.  cTakes is good at interpreting text, but 
>> most lab test results are reported in tabular form that is most 
>> appropriately searched by SQL queries.  Sometimes lab results are also 
>> reported in narrative notes, but parsing those is often more a matter of 
>> deciphering the text structure of tables than of parsing real English text.  
>> What am I misunderstanding?
>> 
>> --Pete Sz.
>> 
>>> On Sep 29, 2014, at 2:25 PM, Ajay Jain  wrote:
>>> 
>>> Hello All,
>>> 
>>> I am working on a use case for lab tests data using cTAKES and my 
>>> online search to find a test dataset has been futile.  I'll greatly 
>>> appreciate if someone can share such a dataset or can point me in the 
>>> right direction to go looking for one.
>>> 
>>> Best,
>>> Ajay
>>> 
>>> --
>>> Founder & CEO
>>> Mobile Insights, Inc.
>>> (630) 408-8623
>> 

Re: De-identified lab tests dataset

2014-09-30 Thread Ajay Jain
John,

I am in the initial stages of my project and I'll take whatever dataset you are 
able to provide without spending a lot of effort extracting it. 

Thanks.
Ajay

Sent from my iPhone

> On Sep 30, 2014, at 5:22 AM, "John Green"  wrote:
> 
> How large? And across how many EMRs? 
> 
> 
> JG
> —
> Sent from Mailbox
> 
> On Mon, Sep 29, 2014 at 6:58 PM, Ajay Jain 
> wrote:
> 
>> Sorry, I wasn't clear. I am working on a related project and trying to 
>> figure out if the code can be repurposed for a lab mention annotator for 
>> cTAKES. From what I have seen, test names from different institutions are 
>> not standardized which makes it hard to standardize the resulting 
>> annotation. Getting access to a larger lab tests dataset (structured) will 
>> help me fine tune the model. 
>> 
>> Hope this helps. 
>> Ajay
>> Sent from my iPhone
>>> On Sep 29, 2014, at 2:12 PM, "Savova, Guergana" 
>>>  wrote:
>>> 
>>> Ajay,
>>> cTAKES currently does not implement a method to discover labs from the 
>>> text. The motivation is that you can get that easily from the structured 
>>> part of the EMR (what Pete explained below). Hope this makes sense!
>>> --Guergana
>>> 
>>> -Original Message-
>>> From: Peter Szolovits [mailto:p...@mit.edu] 
>>> Sent: Monday, September 29, 2014 2:32 PM
>>> To: dev@ctakes.apache.org
>>> Subject: Re: De-identified lab tests dataset
>>> 
>>> Ajay, I'm confused by your query.  cTakes is good at interpreting text, but 
>>> most lab test results are reported in tabular form that is most 
>>> appropriately searched by SQL queries.  Sometimes lab results are also 
>>> reported in narrative notes, but parsing those is often more a matter of 
>>> deciphering the text structure of tables than of parsing real English text. 
>>>  What am I misunderstanding?
>>> 
>>> --Pete Sz.
>>> 
 On Sep 29, 2014, at 2:25 PM, Ajay Jain  wrote:
 
 Hello All,
 
 I am working on a use case for lab tests data using cTAKES and my 
 online search to find a test dataset has been futile.  I'll greatly 
 appreciate if someone can share such a dataset or can point me in the 
 right direction to go looking for one.
 
 Best,
 Ajay
 
 --
 Founder & CEO
 Mobile Insights, Inc.
 (630) 408-8623


Re: De-identified lab tests dataset

2014-09-30 Thread John Green
I could pull a dozen or so "sets" of labs from my own personal bank of notes 
that contain various forms of what you would usually call the lab section of a 
soap note with minimal effort  I dont mind, might take me a couple of days 
with work tempo as it is. Its probably all from of two different emr's total 
though with a handfull of written values in short hand (E.g the classic 
fishbones used for like bnp and cbc), so not a lot of variability but maybe 
enough to start compiling regex's with.


If thats helpful and no one else comes along with some free data of a larger 
sort...




Also, there are about 10 notes I commited to the project a year or so ago as 
examples that may have lab data in them.




JG
—
Sent from Mailbox

On Tue, Sep 30, 2014 at 8:25 AM, Ajay Jain 
wrote:

> John,
> I am in the initial stages of my project and I'll take whatever dataset you 
> are able to provide without spending a lot of effort extracting it. 
> Thanks.
> Ajay
> Sent from my iPhone
>> On Sep 30, 2014, at 5:22 AM, "John Green"  
>> wrote:
>> 
>> How large? And across how many EMRs? 
>> 
>> 
>> JG
>> —
>> Sent from Mailbox
>> 
>> On Mon, Sep 29, 2014 at 6:58 PM, Ajay Jain 
>> wrote:
>> 
>>> Sorry, I wasn't clear. I am working on a related project and trying to 
>>> figure out if the code can be repurposed for a lab mention annotator for 
>>> cTAKES. From what I have seen, test names from different institutions are 
>>> not standardized which makes it hard to standardize the resulting 
>>> annotation. Getting access to a larger lab tests dataset (structured) will 
>>> help me fine tune the model. 
>>> 
>>> Hope this helps. 
>>> Ajay
>>> Sent from my iPhone
 On Sep 29, 2014, at 2:12 PM, "Savova, Guergana" 
  wrote:
 
 Ajay,
 cTAKES currently does not implement a method to discover labs from the 
 text. The motivation is that you can get that easily from the structured 
 part of the EMR (what Pete explained below). Hope this makes sense!
 --Guergana
 
 -Original Message-
 From: Peter Szolovits [mailto:p...@mit.edu] 
 Sent: Monday, September 29, 2014 2:32 PM
 To: dev@ctakes.apache.org
 Subject: Re: De-identified lab tests dataset
 
 Ajay, I'm confused by your query.  cTakes is good at interpreting text, 
 but most lab test results are reported in tabular form that is most 
 appropriately searched by SQL queries.  Sometimes lab results are also 
 reported in narrative notes, but parsing those is often more a matter of 
 deciphering the text structure of tables than of parsing real English 
 text.  What am I misunderstanding?
 
 --Pete Sz.
 
> On Sep 29, 2014, at 2:25 PM, Ajay Jain  
> wrote:
> 
> Hello All,
> 
> I am working on a use case for lab tests data using cTAKES and my 
> online search to find a test dataset has been futile.  I'll greatly 
> appreciate if someone can share such a dataset or can point me in the 
> right direction to go looking for one.
> 
> Best,
> Ajay
> 
> --
> Founder & CEO
> Mobile Insights, Inc.
> (630) 408-8623