On Monday, July 24, 2017 3:54:11 AM PDT Emil Velikov wrote: > Hi Ken, > > Admittedly I'm not an expert in the area, so perhaps a rather silly question. > > On 22 July 2017 at 00:17, Kenneth Graunke <kenn...@whitecape.org> wrote: > > > +#ifdef USE_SSE41 > > + if (!cache->bo->cache_coherent && cpu_has_sse4_1) > > + _mesa_streaming_load_memcpy(map, cache->map, cache->next_offset); > > + else > > +#endif > > + memcpy(map, cache->map, cache->next_offset); > The other user of _mesa_streaming_load_memcpy - > intel_miptree_map/intel_miptree_map_movntdqa does not seem to check > for the coherency flag. > > Which makes me wonder: > Did you intentionally combine the SSE4.1 check with the > !cache_coherent one, should there be a similar check in the miptree > code or the two cases are orthogonal? > > Thanks > Emil >
The other code uses brw->has_llc. Basically, on LLC platforms, all buffers other than scanout are coherent. On non-LLC, almost all buffers are non-coherent. We originally didn't have a bo->cache_coherent flag, and used brw->has_llc as the distinguishing factor. On non-LLC, you can make buffers coherent by enabling snooping (but it's expensive). We haven't ever done that yet, though Chris has patches to do so for query object buffers, where we want the CPU and GPU to be able to read an "I'm done!" flag for CheckQuery. So, it would probably be reasonable to change intel_miptree_map to use bo->cache_coherent instead of brw->has_llc, though this is unlikely to matter in practice since snooping for textures doesn't make much sense, and on LLC systems, we're probably not going to map the scanout buffer. MOVNTDQA gives faster streaming read performance when sourcing from uncached memory, apparently. Non-coherent BOs bypass the CPU caches, so we want to use it there.
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev