Hi Leander,

There is no single correct way to do this, but a couple of similar classes 
exist.  Well, one sat in my sandbox for two years until about 5 seconds ago as 
I only just checked it in.  Anyway, take a look at two classes in ctakes-core 
org.apache.ctakes.core
They are TextSpanWriter and CuiCountFileWriter.

TextSpanWriter writes annotation name | span | covered text in a file, one per 
document.

CuiCountFileWriter writes a list of discovered cuis and their counts.

It sounds like you are interested in a combination of both - basically 
TextSpanWriter with the added output of CUIs.

You can also have a look at EntityCollector of org.apache.ctakes.core.pipeline. 
 It has an annotation engine that keeps a running list of "entities" for the 
whole run, doc ids, spans, text and cuis.

Sean


-----Original Message-----
From: Leander Melms [mailto:me...@students.uni-marburg.de] 
Sent: Friday, March 17, 2017 1:09 PM
To: dev@ctakes.apache.org
Subject: Re: Evaluate cTAKES perfomance

Sorry for writing again. I just have a quick question: My idea is to parse the 
cTAKES output to a text file with a structure like this 
DocName|Spans|CUI|CoveredText|ConceptType and do the same with the cold 
standart (from anafora). 

Is this a correct way to do this? 

I'm new to the subject and happy about the tiniest information on the topic.

Thanks
Leander

I
> On 17 Mar 2017, at 12:05, Leander Melms <me...@students.uni-marburg.de> wrote:
> 
> Hi,
> 
> I've integrated a custom dictionary, retrained some of the OpenNLP models and 
> would like to evaluate the changes on a gold standard. I'd like to calculate 
> the precision, the recall and the f1-score to compare the results.
> 
> My question is: Does cTAKES ship with some evaluation / test scripts? What is 
> the best strategry to do this? Has anyone dealt with this topic before? 
> 
> I'm happy to share the results afterwards if there is interest for it.
> 
> Thanks
> Leander
> 

Reply via email to