Quoting Chris Wilson (2017-10-13 10:34:54) > The primary benefit for this is that we get format conversion for > "free", along with detiling and cache flushing (most relevant for !llc). > Using the GPU does impose a bandwidth cost that is presumably better > used for rendering, hence we limit the use to readback into client > memory (not pbo) where we would need to stall on the GPU anyway. > (Uploads remain direct/staged to avoid the synchronisation cost.) > And we only use the GPU path if a direct read into client memory from > video memory is unavailable. > > The ultimate user of this is Xorg/glamor! On byt, bsw, bxt (and > presumably but not measured ilk), x11perf -shmget500 is improved by > 15-fold. Though conversely the overhead of executing and waiting upon an > additional blorp batch is shown by x11perf -shmget10 being reduced by a > factor of 2. I think it is fair to presume that large copies will > dominate (and that the overhead of a single batch is something that we > can iteratively reduce, for the benefit of all.) llc machines continue to > use direct access where there is no format changes (which one hopes is > the typical use case).
Ah, this needs some improvements to the direct read path (intel_gettexsubimage_tiled_memcpy) to handle subimages for llc + Xorg. I have those in the older patches to enable userptr readback, I'll dig those out again. -Chris _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev