Hi all, as the title says. The implementation uses a compute shader to summarize data from the query buffers. As long as only one query buffer is in flight (the normal case), that compute shader is launched exactly once, on a single thread. If multiple buffers were required, then one compute grid is launched for each of these buffers, in sequence.
All of this could be done in much fancier ways using bindless buffers and wave-wide computations, but really, the expectation is that most queries will be rather simple (though occlusion queries always contain at least 8 result pairs, so it's not like it would be completely pointless). This code also exposes the hilarious lowering of 64-bit integer divides in LLVM, since timestamp queries use it. This lowering generates more than 2KB of code for a single division, which is excessive even when the division *isn't* by a constant. The right place to fix this is in LLVM, and I'm already looking into it. For normal queries this is completely irrelevant because the code will just be skipped. Please review! Thanks Nicolai _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev