On Friday, May 6, 2016 11:42:49 PM PDT Kenneth Graunke wrote: > My old implementation accumulated <start, end> pairs in a buffer, > and eventually processed that data on the CPU. This meant flushing > the batchbuffer and waiting for it to completely execute before we > could map it, resulting in really long stalls. We could also run out > of space in the buffer, and have to do this early. > > Instead, we can use Haswell's MI_MATH command to do the (end - start) > subtraction, as well as the multiplication by 2 or 3 to convert from > the number of primitives written to the number of vertices written. > We still need to CS stall to read the counters, but otherwise everything > is completely pipelined - there's no CPU<->GPU synchronization required. > It also uses only 80 bytes in the buffer, no matter what. > > Improves performance in Manhattan on Skylake GT3e at 800x600 by > 6.1086% +/- 0.954166% (n=9). At 1920x1080, improves performance > by 2.82103% +/- 0.148596% (n=84). > > Signed-off-by: Kenneth Graunke <kenn...@whitecape.org>
Sorry, I forgot to do the s/has_mi_math/has_mi_math_and_lrr/ before sending. Fixed locally.
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev