Great, thanks Greg. I'd like to see the kind of stats that are available beyond what one can scrape from log4j
Peter On Mon, Jan 25, 2021 at 5:16 PM Greg Silverman <g...@umn.edu.invalid> wrote: > Hi Sean, > Thanks! I'll give it a whirl and let you know how it works out. > > Best! > > On Mon, Jan 25, 2021 at 8:48 AM Finan, Sean < > sean.fi...@childrens.harvard.edu> wrote: > > > Hi Greg, Peter, > > > > I believe that the performance report comes from a > > CollectionProcessingEngine (CPE) > > > https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/collection/CollectionProcessingEngine.html > > > > > > I think that UIMA's CPE GUI runs the pipeline through a CPE - hence the > > tool's name, but that may have changed in recent years. > > > > The PipelineBuilder class in ctakes.core used by the PiperFileRunner > could > > be changed to use this style of running a single-threaded pipeline - > right > > now it uses a simpler UIMAFit method. > > The code changes are relatively minor, but obviously significant testing > > would be required. The ctakes PipelineBuilder does use a CPE for > > multi-threaded pipelines, so there has already been some testing on that > > front. > > > > You can look at the ctakes PipelineBuilder run() method. If you get rid > > of the if (threadCount==1) {..} else { the the CPE will always be used. > > Then just add a cpe.getPerformanceReport() after cpe.process() you should > > have a ProcessTrace object. This is where my guessing ends as I have > never > > used a ProcessTrace and don't know exactly what to beg of it. > > > > I hope that is a decent start, > > Sean > > ________________________________________ > > From: Greg Silverman <g...@umn.edu.INVALID> > > Sent: Saturday, January 23, 2021 3:01 PM > > To: dev@ctakes.apache.org > > Subject: Re: performance report [EXTERNAL] > > > > * External Email - Caution * > > > > > > Hi Peter, > > I have no doubt about performance differences regarding variance between > > note styles and pipeline components. > > > > We're looking for a way to benchmark the standard/non-customized pipeline > > performance for processing a largish set of identical notes using several > > clinical NLP annotators (specifically, ctakes, biomedicus, metamap and > > clamp). At the command line, both metamap and biomedicus output a > standard > > performance report with total timings and the details for each specific > > pipeline component. I assume there is a way to enable the performance > > report output available in the GUI version of ctakes at the command line > - > > which is what I'm really interested in. > > > > We're fine with information at a very coarse level, since we're > interested > > in a particular note type, so the aforementioned report should be > > sufficient. I'm just wondering how to enable it using the standard > pipeline > > in cTAKES. > > > > Thanks! > > > > Greg-- > > > > > > > > On Sat, Jan 23, 2021 at 12:26 PM Peter Abramowitsch < > > pabramowit...@gmail.com> > > wrote: > > > > > Hi Greg, > > > > > > I’ve found that there’s so much difference between note styles that > have > > > performance implications and so many interactions between pipeline > > > configurations which affect overall performance, that really the only > way > > > to get a sense of performance is either on a vary coarse level, > measuring > > > process time across large collections of varied notes, or very granular > > > using something like jvisualvm. Using the latter I saw some > surprising > > > things, some of which I was able to tackle with minor software changes, > > > while others are deep in UIMA utilities used by cTakes.. The biggest > > > factor in my experience after processing millions of notes is after > they > > > have reached about 5k AND are missing punctuation. At around this size > > > begins a geometric rise in complexity of internal structures that > depend > > on > > > sentences and a serious elevation of processing time. > > > > > > Peter > > > > > > Sent from my iPad > > > > > > > On Jan 23, 2021, at 18:09, Greg Silverman <g...@umn.edu.invalid> > wrote: > > > > > > > > I found this: > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40felix-5Fchan_install-2Dapache-2Dctakes-2D924c40967ce2&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=s-jUaTKHh4ts1f2UzY5nHsKbjA27HDpqAchBF36juTI&e= > > , which > > > > states: "A performance report is generated when the process is done." > > > > > > > > However, we are running this from the command line and no such report > > is > > > > being generated. > > > > > > > > Thanks! > > > > > > > >> On Sat, Jan 23, 2021 at 11:05 AM Greg Silverman <g...@umn.edu> > wrote: > > > >> > > > >> Hi all, > > > >> Is there a way to easily generate a performance report similar to > the > > > one > > > >> generated by MetaMap (with timings for each task, etc.)? > > > >> > > > >> Thanks in advance! > > > >> > > > >> Greg-- > > > >> > > > >> -- > > > >> Greg M. Silverman > > > >> Senior Systems Developer > > > >> NLP/IE < > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e= > > > > > > >> Department of Surgery > > > >> University of Minnesota > > > >> g...@umn.edu > > > >> > > > >> > > > > > > > > -- > > > > Greg M. Silverman > > > > Senior Systems Developer > > > > NLP/IE < > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e= > > > > > > > Department of Surgery > > > > University of Minnesota > > > > g...@umn.edu > > > > > > > > > -- > > Greg M. Silverman > > Senior Systems Developer > > NLP/IE < > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e= > > > > > Department of Surgery > > University of Minnesota > > g...@umn.edu > > > > > -- > Greg M. Silverman > Senior Systems Developer > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> > Department of Surgery > University of Minnesota > g...@umn.edu >