Evergreen and later chipsets can sample from compressed colorbuffers. Cayman and later chipsets cannot even decompress them. On those chipsets, the decompression code only converts the CMASK+FMASK combo to a texturable FMASK.
Marek On Wed, Jul 17, 2013 at 2:52 AM, Grigori Goronzy <g...@chown.ath.cx> wrote: > On 17.07.2013 02:05, Marek Olšák wrote: >> >> No, it's not faster, but it's not slower either. >> >> Now that I think about it, I can't come up with a good shader-based >> algorithm for the resolve operation. >> >> I don't think Christoph's approach that an MSAA texture can be viewed >> as a larger single-sample texture is correct, because the physical >> locations of the samples in memory usually do not correspond to the >> sample locations the 3D engine used for rasterization. so fetching a >> texel from the larger texture at (x,y) physical coordinates won't >> always return the closest rasterized sample at those coordinates. Also >> the bilinear filter would be horrible in this case, because it only >> takes 4 samples per pixel. >> >> Now let's consider implementing the scaled resolve operation in the >> shader by texelFetch-ing all samples and using a bilinear filter. For >> Nx MSAA, there would be N*4 texel fetches per pixel; in comparison, >> separate resolve+blit needs only N+4 texel fetches per pixel. In >> addition to that, the resolve is a special fixed-function blending >> operation and the fragment shader is not even executed. See? Separate >> resolve+blit beats everything. >> > > AFAICS the point of the spec is that it allows cheaper approximations that > don't use all texels and it allows the implementation to avoid writes to a > temp texture, both to save memory bandwidth. I am not sure if it is > reasonably possible to do this (without causing aliasing). How does scaled > blit on Intel hardware perform compared to resolve+blit? Maybe it helps on > bandwidth-constrained GPU configurations. > > In terms of memory bandwidth per pixel, resolve+blit needs N reads and 1 > write for the resolve step and 1 read for the blit step. If we assume 100% > hit rate for the texture cache, scaled blit needs only N reads and that's > it. So in theory it may work. OTOH, compressed colorbuffers and fast clear > that are used by r600g should reduce actual bandwidth requirements for the > resolve step a lot. And we cannot take advantage of the compression when > we're sampling from colorbuffers. I probably just answered this myself: > resolve+blit is easier and better at least on Radeon hardware. :) > > Grigori _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev