Hello everyone. I am working on automating "git bisect" process for locating mainly performance regressions and progressions, (also usable for locating breakages and fixes).
Of course, the process works correctly only if the performance results are stable enough. And we know from the per-patch perf VPP verify job that many testcases are not reliable enough. Instead of code quality only, something else is also influencing the results, and repeated trials do not seem independent enough for usual statistical methods to work. While testing the "bisection script" prototype, I have noticed that the results from Denverton platform show smaller spread in trial results, enabling the script to locate smaller regressions reliably. But when I tried ip4base testcase (2node testbed, meaning one traffic generator and one VPP on physical machine), I found very surprising results. Full log is [0], but here are results (MRR, number of 64B packets received in 1 second under line rate load) of 60 trials run back-to-back on the same VPP build and testbed: [6058929.0, 6044436.0, 6051129.0, 6070631.0, 6056215.0, 6061268.0, 6057762.0, 6059699.0, 6063921.0, 6066904.0, 6051627.0, 6055370.0, 6047920.0, 6069624.0, 6054088.0, 6055737.0, 6047438.0, 6047390.0, 6060160.0, 6052960.0, 6056360.0, 6055028.0, 6045457.0, 6060301.0, 6058869.0, 6059033.0, 6059880.0, 9712980.0, 8810073.0, 6050160.0, 6063784.0, 6057699.0, 6061905.0, 6059174.0, 6061494.0, 6057585.0, 6043699.0, 6045381.0, 6048290.0, 6051779.0, 9009111.0, 8817494.0, 8847234.0, 7014022.0, 7385958.0, 10867843.0, 10991701.0, 10926844.0, 10971236.0, 6056055.0, 6048881.0, 6059600.0, 6037948.0, 6047664.0, 6057797.0, 6053424.0, 6057050.0, 6044720.0, 6042256.0, 6054110.0] The ~6.05 Mpps values are consistent and usually seen on each VPP build, but the red values can reach almost double of that. Such values are quite rare. The VPP build which got the red values has come from a change which only adds a "make test" test; so I believe earlier builds can also do that, 60 was just not enough tries to make the red values happen. Has anybody seen similar results? Has anybody got an idea of what could be happening inside VPP? Can we fix VPP to be more consistent (ideally on the higher performance)? Vratko. [0] https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-csit-verify-perf-master-2n-dnv/6/console.log.gz
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14543): https://lists.fd.io/g/vpp-dev/message/14543 Mute This Topic: https://lists.fd.io/mt/47503893/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-