On Friday, May 6, 2016 11:42:49 PM PDT Kenneth Graunke wrote:
> My old implementation accumulated <start, end> pairs in a buffer,
> and eventually processed that data on the CPU.  This meant flushing
> the batchbuffer and waiting for it to completely execute before we
> could map it, resulting in really long stalls.  We could also run out
> of space in the buffer, and have to do this early.
> 
> Instead, we can use Haswell's MI_MATH command to do the (end - start)
> subtraction, as well as the multiplication by 2 or 3 to convert from
> the number of primitives written to the number of vertices written.
> We still need to CS stall to read the counters, but otherwise everything
> is completely pipelined - there's no CPU<->GPU synchronization required.
> It also uses only 80 bytes in the buffer, no matter what.
> 
> Improves performance in Manhattan on Skylake GT3e at 800x600 by
> 6.1086% +/- 0.954166% (n=9).  At 1920x1080, improves performance
> by 2.82103% +/- 0.148596% (n=84).
> 
> Signed-off-by: Kenneth Graunke <kenn...@whitecape.org>

Sorry, I forgot to do the s/has_mi_math/has_mi_math_and_lrr/ before
sending.  Fixed locally.

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to