Hi, We’ve been working with Tibor and few other csit-dev and vpp-dev folks on the design to improve VPP performance trend tracking and automate anomaly detection (regression/progression). Result of this work is a design write-up published here: https://wiki.fd.io/view/CSIT/PerformanceTrendingAnalysis
We’re now working on the POC to put it in place two things: 1. PT - performance trending a. improved version of existing csit performance trending jobs. b. initially measuring only MRR (maximum received rate), with unlimited tolerance of packet loss, to measure vpp code efficiency (vpp vectors size at max), and cause it’s fast. c. optimized PDR and NDR searches will be added later. d. jobs should start running today/tomorrow, with daily cadence, 1run/testbed/day. 2. PA - performance analysis a. automated trend analysis and anomaly detection. b. failed jobs signaling regressions. c. anomaly detection based on usual statistical metrics: trimmed moving avg, stdev, mean. - efficiency of detection to be tuned - see above link for description. c. funky graphs summarizing the trend and identifying performance regression and progression - daily, weekly, monthly views.. d. concept code is already generating some graphs - can show on vpp-dev and csit-dev calls for feedback. Hope folks find it useful - looking for comments and feedback.. Cheers, -Maciek