Ah - you meant best way to test.  Sorry, I misread your inquiry as a best way 
to write output.

Yes, that is a great introduction document for ctakes and early tests.  There 
are a few small test classes in ctakes that read anafora files, run ctakes and 
run agreement numbers.  You can find some in the ctakes-temporal module.  I 
didn't write them, and I think that they are built-to-fit purpose-driven 
classes, but you could try to adapt them to a general purpose case.  That would 
be a great thing to have in ctakes!

Sean 

-----Original Message-----
From: Leander Melms [mailto:me...@students.uni-marburg.de] 
Sent: Friday, March 17, 2017 1:46 PM
To: dev@ctakes.apache.org
Subject: Re: Evaluate cTAKES perfomance

Hi Sean,

thank you (again) for your help and feedback! I'll give it a try! Seems like 
the authors of the publication "Mayo clinical Text analysis and Knowledge 
Extraction System" 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_pmc_articles_PMC2995668_&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=PZ0f8s12PJA8W5B4hMlw-0F83VAM9m6E1ypWVaT2hcM&s=Isgii7k_fUy_qLsyqEdh15wKLAnFT6_KeE7zN1dE73Q&e=
  
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_pmc_articles_PMC2995668_&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=PZ0f8s12PJA8W5B4hMlw-0F83VAM9m6E1ypWVaT2hcM&s=Isgii7k_fUy_qLsyqEdh15wKLAnFT6_KeE7zN1dE73Q&e=
 >) did this as well.

Thank you
Leander



> On 17 Mar 2017, at 18:33, Finan, Sean <sean.fi...@childrens.harvard.edu> 
> wrote:
> 
> Hi Leander,
> 
> There is no single correct way to do this, but a couple of similar 
> classes exist.  Well, one sat in my sandbox for two years until about 5 
> seconds ago as I only just checked it in.  Anyway, take a look at two classes 
> in ctakes-core org.apache.ctakes.core They are TextSpanWriter and 
> CuiCountFileWriter.
> 
> TextSpanWriter writes annotation name | span | covered text in a file, one 
> per document.
> 
> CuiCountFileWriter writes a list of discovered cuis and their counts.
> 
> It sounds like you are interested in a combination of both - basically 
> TextSpanWriter with the added output of CUIs.
> 
> You can also have a look at EntityCollector of 
> org.apache.ctakes.core.pipeline.  It has an annotation engine that keeps a 
> running list of "entities" for the whole run, doc ids, spans, text and cuis.
> 
> Sean
> 
> 
> -----Original Message-----
> From: Leander Melms [mailto:me...@students.uni-marburg.de]
> Sent: Friday, March 17, 2017 1:09 PM
> To: dev@ctakes.apache.org
> Subject: Re: Evaluate cTAKES perfomance
> 
> Sorry for writing again. I just have a quick question: My idea is to parse 
> the cTAKES output to a text file with a structure like this 
> DocName|Spans|CUI|CoveredText|ConceptType and do the same with the cold 
> standart (from anafora). 
> 
> Is this a correct way to do this? 
> 
> I'm new to the subject and happy about the tiniest information on the topic.
> 
> Thanks
> Leander
> 
> I
>> On 17 Mar 2017, at 12:05, Leander Melms <me...@students.uni-marburg.de> 
>> wrote:
>> 
>> Hi,
>> 
>> I've integrated a custom dictionary, retrained some of the OpenNLP models 
>> and would like to evaluate the changes on a gold standard. I'd like to 
>> calculate the precision, the recall and the f1-score to compare the results.
>> 
>> My question is: Does cTAKES ship with some evaluation / test scripts? What 
>> is the best strategry to do this? Has anyone dealt with this topic before? 
>> 
>> I'm happy to share the results afterwards if there is interest for it.
>> 
>> Thanks
>> Leander
>> 
> 
> 

Reply via email to