Hi Peter, I have no doubt about performance differences regarding variance between note styles and pipeline components.
We're looking for a way to benchmark the standard/non-customized pipeline performance for processing a largish set of identical notes using several clinical NLP annotators (specifically, ctakes, biomedicus, metamap and clamp). At the command line, both metamap and biomedicus output a standard performance report with total timings and the details for each specific pipeline component. I assume there is a way to enable the performance report output available in the GUI version of ctakes at the command line - which is what I'm really interested in. We're fine with information at a very coarse level, since we're interested in a particular note type, so the aforementioned report should be sufficient. I'm just wondering how to enable it using the standard pipeline in cTAKES. Thanks! Greg-- On Sat, Jan 23, 2021 at 12:26 PM Peter Abramowitsch <pabramowit...@gmail.com> wrote: > Hi Greg, > > I’ve found that there’s so much difference between note styles that have > performance implications and so many interactions between pipeline > configurations which affect overall performance, that really the only way > to get a sense of performance is either on a vary coarse level, measuring > process time across large collections of varied notes, or very granular > using something like jvisualvm. Using the latter I saw some surprising > things, some of which I was able to tackle with minor software changes, > while others are deep in UIMA utilities used by cTakes.. The biggest > factor in my experience after processing millions of notes is after they > have reached about 5k AND are missing punctuation. At around this size > begins a geometric rise in complexity of internal structures that depend on > sentences and a serious elevation of processing time. > > Peter > > Sent from my iPad > > > On Jan 23, 2021, at 18:09, Greg Silverman <g...@umn.edu.invalid> wrote: > > > > I found this: > > https://medium.com/@felix_chan/install-apache-ctakes-924c40967ce2, which > > states: "A performance report is generated when the process is done." > > > > However, we are running this from the command line and no such report is > > being generated. > > > > Thanks! > > > >> On Sat, Jan 23, 2021 at 11:05 AM Greg Silverman <g...@umn.edu> wrote: > >> > >> Hi all, > >> Is there a way to easily generate a performance report similar to the > one > >> generated by MetaMap (with timings for each task, etc.)? > >> > >> Thanks in advance! > >> > >> Greg-- > >> > >> -- > >> Greg M. Silverman > >> Senior Systems Developer > >> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> > >> Department of Surgery > >> University of Minnesota > >> g...@umn.edu > >> > >> > > > > -- > > Greg M. Silverman > > Senior Systems Developer > > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> > > Department of Surgery > > University of Minnesota > > g...@umn.edu > -- Greg M. Silverman Senior Systems Developer NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> Department of Surgery University of Minnesota g...@umn.edu