This is exactly what I was looking for. I've read your answer a liitle late though and I have written a Python script which output something like this:
Strict Mode (#2) ╭───────────┬──────── │ iteration │ precision │ recall │ f1 score │ ├───────────┼──────── │ 0│ 0.66667│ 0.33333│ 0.44444│ │ 1│ 0.66667│ 0.45455│ 0.54054│ ╰───────────┴───────── ╭────────────────────┬─────────── │ Measure │ Macro (SD) │ Micro │ F1 │ ├────────────────────┼─────────── │ Precision│ 0.6667 (0.0)│ 0.6667│ 0.4952│ │ Recall│ 0.3939 (0.061)│ 0.3846│ 0.4878│ ╰────────────────────┴─────────── It needs some testing and a clean up. I'll create a git repo once it's done! Leander > On 19 Mar 2017, at 16:55, Finan, Sean <sean.fi...@childrens.harvard.edu> > wrote: > > Great explanation, > Thank you Tim! > > -----Original Message----- > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] > Sent: Saturday, March 18, 2017 7:18 AM > To: dev@ctakes.apache.org > Subject: Re: Evaluate cTAKES perfomance [SUSPICIOUS] > > To save you a little trouble, in ctakes-temporal we rely a lot on an outside > library called ClearTK that has some evaluation APIs built in that work well > with UIMA frameworks and typical NLP tasks. We use the following classes: > https://urldefense.proofpoint.com/v2/url?u=http-3A__cleartk.github.io_cleartk_apidocs_2.0.0_org_cleartk_eval_AnnotationStatistics.html&d=DwIFAw&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=lKr9UzntVnVdsEbHHjtjhfCS3BgJa6dyTE9LsTnhLkA&s=PUUopYYvh-wxt0oYmHdevHhjzYZh19cvYGae-3pQOd8&e= > https://urldefense.proofpoint.com/v2/url?u=http-3A__cleartk.github.io_cleartk_apidocs_2.0.0_org_cleartk_eval_Evaluation-5FImplBase.html&d=DwIFAw&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=lKr9UzntVnVdsEbHHjtjhfCS3BgJa6dyTE9LsTnhLkA&s=MP2Jy56D9Rj58htcPx5g_oX_Ca-ACJVdAJnysg2H0Uc&e= > > > The simplest place to start looking in ctakes-temporal is probably the > EventAnnotator and its evaluation, since they are simple one word spans. Then > the TimeAnnotator is slightly more complicated with multi-word spans. Then if > you are interested in evaluating relations I would suggest switching over to > ctakes-relation-extractor which is more stable than the ctakes-temporal > relation code, which is an area of highly active (i.e., funded) research and > so the code has not been cleaned up as much. > Tim > > ________________________________________ > From: Leander Melms <me...@students.uni-marburg.de> > Sent: Friday, March 17, 2017 3:05 PM > To: dev@ctakes.apache.org > Subject: Re: Evaluate cTAKES perfomance > > Thanks! I'll have a look at it and will try to give something back to the > community! > > Leander > > >> On 17 Mar 2017, at 19:42, Finan, Sean <sean.fi...@childrens.harvard.edu> >> wrote: >> >> Ah - you meant best way to test. Sorry, I misread your inquiry as a best >> way to write output. >> >> Yes, that is a great introduction document for ctakes and early tests. >> There are a few small test classes in ctakes that read anafora files, run >> ctakes and run agreement numbers. You can find some in the ctakes-temporal >> module. I didn't write them, and I think that they are built-to-fit >> purpose-driven classes, but you could try to adapt them to a general purpose >> case. That would be a great thing to have in ctakes! >> >> Sean >> >> -----Original Message----- >> From: Leander Melms [mailto:me...@students.uni-marburg.de] >> Sent: Friday, March 17, 2017 1:46 PM >> To: dev@ctakes.apache.org >> Subject: Re: Evaluate cTAKES perfomance >> >> Hi Sean, >> >> thank you (again) for your help and feedback! I'll give it a try! Seems like >> the authors of the publication "Mayo clinical Text analysis and Knowledge >> Extraction System" >> (https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_pmc_articles_PMC2995668_&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=PZ0f8s12PJA8W5B4hMlw-0F83VAM9m6E1ypWVaT2hcM&s=Isgii7k_fUy_qLsyqEdh15wKLAnFT6_KeE7zN1dE73Q&e= >> >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_pmc_articles_PMC2995668_&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=PZ0f8s12PJA8W5B4hMlw-0F83VAM9m6E1ypWVaT2hcM&s=Isgii7k_fUy_qLsyqEdh15wKLAnFT6_KeE7zN1dE73Q&e=>) >> did this as well. >> >> Thank you >> Leander >> >> >> >>> On 17 Mar 2017, at 18:33, Finan, Sean <sean.fi...@childrens.harvard.edu> >>> wrote: >>> >>> Hi Leander, >>> >>> There is no single correct way to do this, but a couple of similar >>> classes exist. Well, one sat in my sandbox for two years until about 5 >>> seconds ago as I only just checked it in. Anyway, take a look at two >>> classes in ctakes-core org.apache.ctakes.core They are TextSpanWriter and >>> CuiCountFileWriter. >>> >>> TextSpanWriter writes annotation name | span | covered text in a file, one >>> per document. >>> >>> CuiCountFileWriter writes a list of discovered cuis and their counts. >>> >>> It sounds like you are interested in a combination of both - basically >>> TextSpanWriter with the added output of CUIs. >>> >>> You can also have a look at EntityCollector of >>> org.apache.ctakes.core.pipeline. It has an annotation engine that keeps a >>> running list of "entities" for the whole run, doc ids, spans, text and cuis. >>> >>> Sean >>> >>> >>> -----Original Message----- >>> From: Leander Melms [mailto:me...@students.uni-marburg.de] >>> Sent: Friday, March 17, 2017 1:09 PM >>> To: dev@ctakes.apache.org >>> Subject: Re: Evaluate cTAKES perfomance >>> >>> Sorry for writing again. I just have a quick question: My idea is to parse >>> the cTAKES output to a text file with a structure like this >>> DocName|Spans|CUI|CoveredText|ConceptType and do the same with the cold >>> standart (from anafora). >>> >>> Is this a correct way to do this? >>> >>> I'm new to the subject and happy about the tiniest information on the topic. >>> >>> Thanks >>> Leander >>> >>> I >>>> On 17 Mar 2017, at 12:05, Leander Melms <me...@students.uni-marburg.de> >>>> wrote: >>>> >>>> Hi, >>>> >>>> I've integrated a custom dictionary, retrained some of the OpenNLP models >>>> and would like to evaluate the changes on a gold standard. I'd like to >>>> calculate the precision, the recall and the f1-score to compare the >>>> results. >>>> >>>> My question is: Does cTAKES ship with some evaluation / test scripts? What >>>> is the best strategry to do this? Has anyone dealt with this topic before? >>>> >>>> I'm happy to share the results afterwards if there is interest for it. >>>> >>>> Thanks >>>> Leander >>>> >>> >>> >> >> > >