On 09/16/2016 06:57 AM, Nicolai Hähnle wrote: > Hi all, > > as the title says. The implementation uses a compute shader to summarize > data from the query buffers. As long as only one query buffer is in flight > (the normal case), that compute shader is launched exactly once, on a > single thread. If multiple buffers were required, then one compute grid is > launched for each of these buffers, in sequence. > > All of this could be done in much fancier ways using bindless buffers and > wave-wide computations, but really, the expectation is that most queries > will be rather simple (though occlusion queries always contain at least 8 > result pairs, so it's not like it would be completely pointless). > > This code also exposes the hilarious lowering of 64-bit integer divides > in LLVM, since timestamp queries use it. This lowering generates more than > 2KB of code for a single division, which is excessive even when the division > *isn't* by a constant. The right place to fix this is in LLVM, and I'm > already looking into it. For normal queries this is completely irrelevant > because the code will just be skipped.
Is the division by a constant? If it is, you might want to use something like what libdivide would generate. > Please review! > Thanks > Nicolai > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev