Hi all, The last couple of months we (at Collabora) have been working on a prototype for a Mesa testing system based on trace replays that supports correctness regression testing and, in the future, performance regression testing.
We are aware that large-scale CI systems that perform extensive checks on Mesa already exist. However, our goal is not to reach that kind of scale or exhaustiveness, but to produce a system that will be simple and robust enough to be maintained by the community, while being useful enough so that the community will want to use and maintain it. We also want to be able to make it fast enough so that it will be run eventually on a regular basis, ideally in pre-commit fashion. The current prototype focuses on the correctness aspect, replaying traces and comparing images against a set of reference images on multiple devices. At the moment, we run on softpipe and intel/chromebook, but it's straightforward to add other devices through gitlab runners. For the prototype we have used a simple approach for image comparison, storing a separate set of reference images per device and using exact image comparison, but we are also investigating alternative ways to deal with this. First results indicate that the frequency of reference image mismatches due to non-bug changes in Mesa is acceptable, but we will get a more complete picture once we have a richer set of traces and a longer CI run history. The current design is based on an out-of-tree approach, where the tracie CI works independently from Mesa CI, fetching and building the latest Mesa on its own. We did this for maximum flexibility in the prototyping phase, but this has a complexity cost, and although we could continue to work this way, we would like to hear people's thoughts about eventually integrating with Mesa more closely, by becoming part of the upstream Mesa testing pipelines. It's worth noting that the last few months other people, most notably Eric Anholt, have made proposals to extend the scope of testing in CI. We believe there is much common ground here (multiple devices, deployment with gitlab runners) and room for cooperation and eventual integration into upstream Mesa. In the end, the main difference between all these efforts are the kind of tests (deqp, traces, performance) that are being run, which all have their place and offer different trade-offs. We have also implemented a prototype dashboard to display the results, which we have deployed at: https://tracie.freedesktop.org We are working to improve the dashboard and provide more value by extracting and displaying additional information, e.g., "softpipe broken since commit NNN". The dashboard is currently specific to the trace playback results, but it would be nice to eventually converge to a single MesaCI dashboard covering all kinds of Mesa CI test results. We would be happy to help develop in this direction if there is interest. You can find the CI scripts for tracie at: https://gitlab.freedesktop.org/gfx-ci/tracie/tracie Code for the dashboard is at: https://gitlab.freedesktop.org/gfx-ci/tracie/tracie_dashboard Here is an example of a failed CI job (for a purposefully broken Mesa commit) and the report of the failed trace (click on the red X to see the image diffs): https://tracie.freedesktop.org/dashboard/job/642369/ Looking forward to your thoughts and comments. Thanks, Alexandros _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev