Eero Tamminen <eero.t.tammi...@intel.com> writes: > Hi, > > On 27.9.2019 4.32, Eric Anholt wrote: >> Alexandros Frantzis <alexandros.frant...@collabora.com> writes: >>> The last couple of months we (at Collabora) have been working on a >>> prototype for a Mesa testing system based on trace replays that supports >>> correctness regression testing and, in the future, performance >>> regression testing. >>> >>> We are aware that large-scale CI systems that perform extensive checks >>> on Mesa already exist. However, our goal is not to reach that kind of >>> scale or exhaustiveness, but to produce a system that will be simple and >>> robust enough to be maintained by the community, while being useful >>> enough so that the community will want to use and maintain it. We also >>> want to be able to make it fast enough so that it will be run eventually >>> on a regular basis, ideally in pre-commit fashion. >>> >>> The current prototype focuses on the correctness aspect, replaying >>> traces and comparing images against a set of reference images on >>> multiple devices. At the moment, we run on softpipe and >>> intel/chromebook, but it's straightforward to add other devices through >>> gitlab runners. >>> >>> For the prototype we have used a simple approach for image comparison, >>> storing a separate set of reference images per device and using exact >>> image comparison, but we are also investigating alternative ways to deal >>> with this. First results indicate that the frequency of reference image >>> mismatches due to non-bug changes in Mesa is acceptable, but we will get >>> a more complete picture once we have a richer set of traces and a longer >>> CI run history. > > For CI, I think discarding/ignoring too unstable / slow traces would > be perfectly acceptable. [1] > > >> Some missing context: I was told that over 2400 commits, in glmark2 + a >> couple of other open source traces, on intel, there was one spurious >> failure due to this diff method. This is lower than I felt like it was >> when I did this in piglit on vc4, but then I was very actively changing >> optimization in the compiler while I was using that tool. > > Few years ago when I was looking at the results from ezBench (at > the same time) bisecting Mesa commit ranges for build, run-time, > performance and rendering issues, it was very useful to have > rendering diff results in addition to performance numbers. > > Rendering didn't change too often, but one needs to look at every change > screenshot directly, error metrics about them aren't enough. Some > innocent accuracy difference due to calculation order change can cause > e.g. marginally different color on huge area on the rendered result [1], > whereas some real rendering error can be just some tiny reflection > missing from the render, which one would never see from running > benchmark (even with correct one running beside it), one sees them only > from static screenshots. > > [1] Whether screenshots are affected by calculation changes, depends > a lot on the benchmark, how stable its calculations are in regards to > accuracy variations. Some benchmarks even use random in their shaders... > > (If I remember correctly, good example of unstable results were some > of the GpuTest benchmarks.) > > >>> The current design is based on an out-of-tree approach, where the tracie >>> CI works independently from Mesa CI, fetching and building the latest >>> Mesa on its own. We did this for maximum flexibility in the prototyping >>> phase, but this has a complexity cost, and although we could continue to >>> work this way, we would like to hear people's thoughts about eventually >>> integrating with Mesa more closely, by becoming part of the upstream >>> Mesa testing pipelines. >>> >>> It's worth noting that the last few months other people, most notably >>> Eric Anholt, have made proposals to extend the scope of testing in CI. >>> We believe there is much common ground here (multiple devices, >>> deployment with gitlab runners) and room for cooperation and eventual >>> integration into upstream Mesa. In the end, the main difference between >>> all these efforts are the kind of tests (deqp, traces, performance) that >>> are being run, which all have their place and offer different >>> trade-offs. >>> >>> We have also implemented a prototype dashboard to display the results, >>> which we have deployed at: >>> >>> https://tracie.freedesktop.org >>> >>> We are working to improve the dashboard and provide more value by >>> extracting and displaying additional information, e.g., "softpipe broken >>> since commit NNN". >>> >>> The dashboard is currently specific to the trace playback results, but >>> it would be nice to eventually converge to a single MesaCI dashboard >>> covering all kinds of Mesa CI test results. We would be happy to help >>> develop in this direction if there is interest. >>> >>> You can find the CI scripts for tracie at: >>> >>> https://gitlab.freedesktop.org/gfx-ci/tracie/tracie >>> >>> Code for the dashboard is at: >>> >>> https://gitlab.freedesktop.org/gfx-ci/tracie/tracie_dashboard >>> >>> Here is an example of a failed CI job (for a purposefully broken Mesa >>> commit) and the report of the failed trace (click on the red X to >>> see the image diffs): >>> >>> https://tracie.freedesktop.org/dashboard/job/642369/ >>> >>> Looking forward to your thoughts and comments. >> >> A couple of thoughts on this: >> >> A separate dashboard is useful if we have traces that are too slow to >> run pre-merge or are not redistributable. For traces that are >> redistributable and cheap to run, we should run them in our CI and block >> the merge instead of having someone have to watch an external dashboard >> and report things to get patched up after regressions have already >> landed. >> >> I'm reluctant to add "maintain a web service codebase" as one of the >> things that the Mesa project does, if there are alternatives that don't >> involve that. I've been thinking about a perf dashboard, and for that >> I'd like to reuse existing open source projects like grafana. If we >> start our own dashboard project, are we going to end up reimplementing >> that one? > > FYI: We're tried Grafana and while it's otherwise nice and fast, we > didn't find a way to get each data point in graph to be a link to > additional data (logs, screenshots etc), which IMHO makes it much less > useful for trend tracking. > > (If there actually *is* a way to add links to each data point, I would > be very much interested.)
Looks like one can do so at the graph level now: https://grafana.com/docs/features/panels/graph/ I thought there was something else for your metric being able to have arbitrary extra data attached, which I'm not finding in a quick search right now.
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev