Am 11.02.2014 22:58, schrieb Dave Airlie: >>> dst.z = texture_depth(unit, lod) >>> >>> +.. opcode:: TG4 - Texture Gather (as per ARB_texture_gather) >>> + Gathers the four texels to be used in a bi-linear >>> + filtering operation and packs them into a single register. >>> + Only works with 2D, 2D array, cubemaps, and cubemaps arrays. >>> + For 2D textures, only the addressing modes of the sampler >>> and >>> + the top level of any mip pyramid are used. Set W to zero. >>> + It behaves like the TEX instruction, but a filtered >>> + sample is not generated. The four samples that contribute >>> + to filtering are placed into xyzw in clockwise order, >>> + starting with the (u,v) texture coordinate delta at the >>> + following locations (-, +), (+, +), (+, -), (-, -), where >>> + the magnitude of the deltas are half a texel. >>> + >>> + PIPE_CAP_TEXTURE_SM5 enhances this instruction to support >>> + shadow per-sample depth compares, single component >>> selection, >>> + and a non-constant offset. It doesn't allow support for the >>> + GL independent offset to get i0,j0. This would require >>> another >>> + CAP is hw can do it natively. For now we lower that before >>> + TGSI. >>> + >>> +.. math:: >>> + >>> + coord = src0 >>> + >>> + component = src1 >>> + >>> + dst = texture_gather4 (unit, coord, component) >>> + >>> +(with SM5 - cube array shadow) >>> + >>> + coord = src0 >>> + >>> + compare = src1 >>> + >>> + dst = texture_gather (uint, coord, compare) >>> + >> So how does component selection work with the latter version? >> I think it would be nice if you wouldn't really need two versions (so if >> you don't support comparisons, the src would just be unused). > > That's docs not being clear enough if you read it like that. The > second version is only for cube array shadow compares, which have no > components. The first version is the same for non-shadow compares. Ah right that works, I forgot you don't need the channel select with shadow comparisons (not that I'm a big fan of such "overloaded" sources but that's nothing new really).
> >> Also, FWIW for llvmpipe you'd probably wanted a native 4 offsets >> versions, I don't think llvm could eliminate the huge amount of >> duplicated code completely if you generate 4 texture lookups. Of course, >> someone would need to implement it first (shouldn't be too difficult). > > Yeah llvmpipe might be in the category for using the extra CAP, I'm > really hoping nvidia hw does do this, but the interface is kinda > arbitrary and maybe we should consider another opcode, > > Since we have for SM5 nonconstant ones something like, > > TG4 TEMP[1], TEMP[1], SAMP[0] , TEMP[2].xyz > which will sample around temp[1] i0,j0 - i1, j1 at the offset in temp[2] > > and > TG4 TEMP[1], TEMP[1], SAMP[0], TEMP[2].xyz, TEMP[3].xyz, TEMP[4].xyz, > TEMP[5].xyz > which will sample i0,j0 from TEMP[1] and the respective offsets. > Yes since the offsets are in separate offset structure and the amount of offsets is indicated I think it should just work actually if a driver wants to implement multiple offsets natively. Roland _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev