> Well, don't forget that you have to populate the tile from somewhere - > so you'll hit all of the same cachelines that the non-swizzled version > would have. > > We still get locality from binning, meaning that all accesses to a group > of cachelines come in a single burst, after which they are done with and > can migrate out of L1 cache according to the processors own mechanisms. > With swizzling, we need to write them out ourselves, and try and do so > without blowing the caches (which is possible with non-temporal writes, > but it's still an extra operation).
Couldn't you just keep resources permanently in swizzled format? (Un)swizzling should be necessary only for transfers and for displaying to screen, which should both be relatively rare. Most hardware GPUs do this, although some can scan directly from swizzled buffers. You could even use the hardware OpenGL implementation to perform unswizzling to the display. > A tile covers the same number of cachelines either way, and in normal > rendering the entire tile gets written, even if it's just with the clear > color. That's right for render targets. For textures, however, it might reduce cache misses on some access patterns. That's also no longer the case if binning is removed and the whole pipeline from vertex buffers to render targets in done in a single call (which might be beneficial if there is a lot of pixel-sized tessellated triangles, but not sure). However, if it does not actually provide any clear benefits in practice, then removing swizzling probably makes sense, since it would significantly simplify the codebase. > It's also worth noting that things like improving texture sampling are > far higher on the list. We do quite well in non-textured mesa demos > relative to i965 (50% from a single core seems typical), but drop behind > drastically in things like tunnel, etc. Swizzling or not won't bridge > that gap. Unfortunately special purpose hardware helps a lot here, as the Larrabee plans showed by keeping it. Are the issues due to cache misses, or simply having a lot of instructions for each texture sampling? Artificially forcing nearest filtering might help a lot, but at an obvious drastic quality price. Analyzing the shader and doing some special approximate projection pass on the texture itself if the shader only projects a texture could also perhaps be useful, but looks like a lot of work. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev