Hi, No experience with pytest-benchmark, but I maintain an unrelated project that runs some benchmarks on CI, and here are some things worth mentioning:
- we store the results as a newline-delimited JSON file in a different GitHub repository (https://raw.githubusercontent.com/rust-analyzer/metrics/master/metrics.json, warning, it's a 5.5 MB unformatted JSON) - we have an in-browser dashboard that retrieves the whole file and displays them: https://rust-analyzer.github.io/metrics/ - we do track build time and overall run time, but we're more interested in correctness - the display is a bit of a mess (partly due to trying to keep the setup as simple as possible), but you can look for the "total time", "total memory" and "build" to get an idea - we store the runner CPU type and memory in that JSON; they're almost all Intel, but they do upgrade from time to time - we even have two AMD EPYC runs, note that boost is disabled in a different way there (we don't try to disable it, though) - we also try to measure the CPU instruction count (the perf counter), but it doesn't work on GitHub and probably in most VMs - the runners have been very reliable, but not really consistent in performance - a bigger problem for us was that somebody actually needs to look at the dashboard to spot any regressions and investigate them (some are caused by external changes) - in 3-5 years we'll probably have to trim down the JSON or switch to a different storage Laurentiu On Tue, Oct 10, 2023, at 21:08, Even Rouault via gdal-dev wrote: > Hi, > > I'm experimenting with adding performance regression testing in our CI. > Currently our CI has quite extensive functional coverage, but totally > lacks performance testing. Given that we use pytest, I've spotted > pytest-benchmark (https://pytest-benchmark.readthedocs.io/en/latest/) as > a likely good candidate framework. > > I've prototyped things in https://github.com/OSGeo/gdal/pull/8538 > > Basically, we now have a autotest/benchmark directory where performance > tests can be written. > > Then in the CI, we checkout a reference commit, build it and run the > performance test suite in --benchmark-save mode > > And then we run the performance test suite on the PR in > --benchmark-compare mode with a --benchmark-compare-fail="mean:5%" > criterion (which means that a test fails if its mean runtime is 5% > slower than the reference one) > > From what I can see, pytest-benchmark behaves correctly if tests are > removed or added (that is not failing, just skipping them during > comparison). The only thing one should not do is modify an existing test > w.r.t the reference branch. > > Does someone has practical experience of pytest-benchmark, in particular > in CI setups? With virtualization, it is hard to guarantee that other > things happening on the host running the VM might not interfer. Even > locally on my own machine, I initially saw strong variations in timings, > which can be reduced to acceptable deviation by disabling Intel > Turboboost feature (echo 1 | sudo tee > /sys/devices/system/cpu/intel_pstate/no_turbo) > > Even > > -- > http://www.spatialys.com > My software is free, but my time generally not. > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev _______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev