Hi Sean, Thanks! I'll give it a whirl and let you know how it works out. Best!
On Mon, Jan 25, 2021 at 8:48 AM Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Greg, Peter, > > I believe that the performance report comes from a > CollectionProcessingEngine (CPE) > https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/collection/CollectionProcessingEngine.html > > > I think that UIMA's CPE GUI runs the pipeline through a CPE - hence the > tool's name, but that may have changed in recent years. > > The PipelineBuilder class in ctakes.core used by the PiperFileRunner could > be changed to use this style of running a single-threaded pipeline - right > now it uses a simpler UIMAFit method. > The code changes are relatively minor, but obviously significant testing > would be required. The ctakes PipelineBuilder does use a CPE for > multi-threaded pipelines, so there has already been some testing on that > front. > > You can look at the ctakes PipelineBuilder run() method. If you get rid > of the if (threadCount==1) {..} else { the the CPE will always be used. > Then just add a cpe.getPerformanceReport() after cpe.process() you should > have a ProcessTrace object. This is where my guessing ends as I have never > used a ProcessTrace and don't know exactly what to beg of it. > > I hope that is a decent start, > Sean > ________________________________________ > From: Greg Silverman <g...@umn.edu.INVALID> > Sent: Saturday, January 23, 2021 3:01 PM > To: dev@ctakes.apache.org > Subject: Re: performance report [EXTERNAL] > > * External Email - Caution * > > > Hi Peter, > I have no doubt about performance differences regarding variance between > note styles and pipeline components. > > We're looking for a way to benchmark the standard/non-customized pipeline > performance for processing a largish set of identical notes using several > clinical NLP annotators (specifically, ctakes, biomedicus, metamap and > clamp). At the command line, both metamap and biomedicus output a standard > performance report with total timings and the details for each specific > pipeline component. I assume there is a way to enable the performance > report output available in the GUI version of ctakes at the command line - > which is what I'm really interested in. > > We're fine with information at a very coarse level, since we're interested > in a particular note type, so the aforementioned report should be > sufficient. I'm just wondering how to enable it using the standard pipeline > in cTAKES. > > Thanks! > > Greg-- > > > > On Sat, Jan 23, 2021 at 12:26 PM Peter Abramowitsch < > pabramowit...@gmail.com> > wrote: > > > Hi Greg, > > > > I’ve found that there’s so much difference between note styles that have > > performance implications and so many interactions between pipeline > > configurations which affect overall performance, that really the only way > > to get a sense of performance is either on a vary coarse level, measuring > > process time across large collections of varied notes, or very granular > > using something like jvisualvm. Using the latter I saw some surprising > > things, some of which I was able to tackle with minor software changes, > > while others are deep in UIMA utilities used by cTakes.. The biggest > > factor in my experience after processing millions of notes is after they > > have reached about 5k AND are missing punctuation. At around this size > > begins a geometric rise in complexity of internal structures that depend > on > > sentences and a serious elevation of processing time. > > > > Peter > > > > Sent from my iPad > > > > > On Jan 23, 2021, at 18:09, Greg Silverman <g...@umn.edu.invalid> wrote: > > > > > > I found this: > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40felix-5Fchan_install-2Dapache-2Dctakes-2D924c40967ce2&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=s-jUaTKHh4ts1f2UzY5nHsKbjA27HDpqAchBF36juTI&e= > , which > > > states: "A performance report is generated when the process is done." > > > > > > However, we are running this from the command line and no such report > is > > > being generated. > > > > > > Thanks! > > > > > >> On Sat, Jan 23, 2021 at 11:05 AM Greg Silverman <g...@umn.edu> wrote: > > >> > > >> Hi all, > > >> Is there a way to easily generate a performance report similar to the > > one > > >> generated by MetaMap (with timings for each task, etc.)? > > >> > > >> Thanks in advance! > > >> > > >> Greg-- > > >> > > >> -- > > >> Greg M. Silverman > > >> Senior Systems Developer > > >> NLP/IE < > https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e= > > > > >> Department of Surgery > > >> University of Minnesota > > >> g...@umn.edu > > >> > > >> > > > > > > -- > > > Greg M. Silverman > > > Senior Systems Developer > > > NLP/IE < > https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e= > > > > > Department of Surgery > > > University of Minnesota > > > g...@umn.edu > > > > > -- > Greg M. Silverman > Senior Systems Developer > NLP/IE < > https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e= > > > Department of Surgery > University of Minnesota > g...@umn.edu > -- Greg M. Silverman Senior Systems Developer NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> Department of Surgery University of Minnesota g...@umn.edu