----- Original Message ----- > On 15.12.2011 20:09, Jose Fonseca wrote: > > ----- Original Message ----- > >> On 12/14/2011 12:58 AM, Ian Romanick wrote: > >>> On 12/13/2011 01:25 PM, Jose Fonseca wrote: > >>>> > >>>> ----- Original Message ----- > >>>>> On 12/13/2011 03:09 PM, Jose Fonseca wrote: > >>>>>> ----- Original Message ----- > >>>>>>> On 12/13/2011 12:26 PM, Bryan Cain wrote: > >>>>>>>> On 12/13/2011 02:11 PM, Jose Fonseca wrote: > >>>>>>>>> ----- Original Message ----- > >>>>>>>>>> This is an updated version of the patch set I sent to the > >>>>>>>>>> list > >>>>>>>>>> a > >>>>>>>>>> few > >>>>>>>>>> hours > >>>>>>>>>> ago. > >>>>>>>>>> There is now a TGSI property called > >>>>>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES > >>>>>>>>>> that drivers can use to determine how many of the 8 > >>>>>>>>>> available > >>>>>>>>>> clip > >>>>>>>>>> distances > >>>>>>>>>> are actually used by a shader. > >>>>>>>>> Can't the info in TGSI_PROPERTY_NUM_CLIP_DISTANCES be > >>>>>>>>> easily > >>>>>>>>> derived from the shader, and queried through > >>>>>>>>> src/gallium/auxiliary/tgsi/tgsi_scan.h ? > >>>>>>>> No. The clip distances can be indirectly addressed (there > >>>>>>>> are > >>>>>>>> up > >>>>>>>> to 2 > >>>>>>>> of them in vec4 form for a total of 8 floats), which makes > >>>>>>>> it > >>>>>>>> impossible > >>>>>>>> to determine which ones are used by analyzing the shader. > >>>>>>> The description is almost complete. :) The issue is that the > >>>>>>> shader > >>>>>>> may > >>>>>>> declare > >>>>>>> > >>>>>>> out float gl_ClipDistance[4]; > >>>>>>> > >>>>>>> the use non-constant addressing of the array. The compiler > >>>>>>> knows > >>>>>>> that > >>>>>>> gl_ClipDistance has at most 4 elements, but post-hoc analysis > >>>>>>> would > >>>>>>> not > >>>>>>> be able to determine that. Often the fixed-function hardware > >>>>>>> (see > >>>>>>> below) needs to know which clip distance values are actually > >>>>>>> written. > >>>>>> But don't all the clip distances written by the shader need to > >>>>>> be > >>>>>> declared? > >>>>>> > >>>>>> E.g.: > >>>>>> > >>>>>> DCL OUT[0], CLIPDIST[0] > >>>>>> DCL OUT[1], CLIPDIST[1] > >>>>>> DCL OUT[2], CLIPDIST[2] > >>>>>> DCL OUT[3], CLIPDIST[3] > >>>>>> > >>>>>> therefore a trivial analysis of the declarations convey that? > >>>>> No. Clip distance is an array of up to 8 floats in GLSL, but > >>>>> it's > >>>>> represented in the hardware as 2 vec4s. You can tell by > >>>>> analyzing > >>>>> the > >>>>> declarations whether there are more than 4 clip distances in > >>>>> use, > >>>>> but > >>>>> not which components the shader writes to. > >>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES is the number of components in > >>>>> use, > >>>>> not > >>>>> the number of full vectors. > >>>> Lets imagine > >>>> > >>>> out float gl_ClipDistance[6]; > >>>> > >>>> Each a clip distance is a scalar float. > >>>> > >>>> Either all hardware represents the 8 clip distances as two 4 > >>>> vectors, > >>>> and we do: > >>>> > >>>> DCL OUT[0].xywz, CLIPDIST[0] > >>>> DCL OUT[1].xy, CLIPDIST[1] > >>>> > >>>> using the full range of struct tgsi_declaration::UsageMask [1] > >>>> or > >>>> we > >>>> represent them as as scalars: > >>>> > >>>> DCL OUT[0].x, CLIPDIST[0] > >>>> DCL OUT[1].x, CLIPDIST[1] > >>>> DCL OUT[2].x, CLIPDIST[2] > >>>> DCL OUT[3].x, CLIPDIST[3] > >>>> DCL OUT[4].x, CLIPDIST[4] > >>>> DCL OUT[5].x, CLIPDIST[5] > >>>> > >>>> If indirect addressing is allowed as I read bore, then maybe the > >>>> later > >>>> is better. > >>> As far as I'm aware, all hardware represents it as the former, > >>> and > >>> we > >>> have a lowering pass to fix-up the float[] accesses to be vec4[] > >>> accesses. > >> GeForce8+ = scalar architecture, no vectors, addresses are byte > >> based, > >> can access individual components just fine. > > Ok. So we should avoid baking this vec4 assumption in TGSI > > semantics. > > > >> Something like: > >> > >> gl_ClipDistance[i - 12] = some_value; > >> > >> DCL OUT[0].xyzw, POSITION > >> DCL OUT[1-8].x, CLIPDIST[0-7] > >> > >> MOV OUT<1>[ADDR[0].x - 12].x, TEMP[0].xxxx > >> * ** > >> > >> * - tgsi_dimension.Index specifying the base address by > >> referencing > >> a > >> declaration > >> ** - tgsi_src_register.Index > >> > >> is the only way I see to make this work nicely on all hardware. > >> (This is also needed if OUT[i] and OUT[i + 1] cannot be assigned > >> to > >> contiguous hardware resources because of semantic.) > > I think that having indexable temps, like D3D10, would be a > > cleaner: > > The problem is that we need an indexable version of every file then > (at > least INPUT, OUTPUT), and then all the nice 32 bit structs break down > when we get more than 16 files. > > D3D doesn't have these because indirect IN/OUT isn't allowed there, > but > it is in GL and the hardware can do it.
Indirect IN/OUT is allowed on D3D9 , http://msdn.microsoft.com/en-us/library/windows/desktop/bb172963%28v=vs.85%29.aspx , but it looks like SM4 indeed doens't allow, http://msdn.microsoft.com/en-us/library/windows/desktop/ff471378%28v=VS.85%29.aspx , which means that indirect input needs spilling the inputs into a indexable temporary. > Also, having an indexable version of every file seems odd, especially > since we need a way to distinguish individual arrays inside that file > anyway (just SM4 uses 2 indices to access INDEXABLE_TEMP; for INPUT > we'll need 3 indices). Fair enough. > > DCL OUT[0].xyzw, POSITION > > DCL OUT[1][0-7].x, CLIPDIST[0-7] > > > > MOV OUT[1][ADDR[0].x - 12].x, TEMP[0].xxxx > > > > I propose we first add this new kind of temp at a first stage, then > > prohibit indirect addressing of all but this kind of temps. > > There's already TEMPORARY_ARRAY, but no one wants to use it because > it's > not clear how to distinguish individual arrays ... There's a reference implementation in src/gallium/auxiliary/tgsi/tgsi_exec.c but I can't find any use of it outside. I'm growing fonder to your proposal. Besides avoiding a new set of files, botching this into register ranges also allows a smoother transition. Instead of MOV OUT<1>[ADDR[0].x - 12].x, TEMP[0].xxxx I think this syntax would be cleare: MOV OUT[1 + (ADDR[0].x - 12)].x, TEMP[0].xxxx (no semantic change what so ever. Another approach (instead of adding a new indexing term to the indirect addressing token) would be to force the indirect register batch to be the start of a range. DCL OUT[0].xyzw, POSITION DCL OUT[1][0-7].x, CLIPDIST[0-7] DCL IMM[0], {12} SUB TEMP[2], TEMP[2], IMM[0].x ARR ADDR[0], TEMP[2] MOV OUT[ADDR[0].x + 1].x, TEMP[0].xxxx but it does seem a bit convoluted... > >> For constrained hardware the driver can build the clunky > >> > >> c := ADDR[0].x % 4 > >> i := ADDR[0].x / 4 > >> IF [c == 0] > >> MOV OUT[i].x, TEMP[0].xxxx > >> ELSE > >> IF [c == 1] > >> MOV OUT[i].y, TEMP[0].xxxx > >> ELSE > >> IF [c == 2] > >> MOV OUT[i].z, TEMP[0].xxxx > >> ELSE > >> MOV OUT[i].w, TEMP[0].xxxx > >> ENDIF > >> > >> itself. > > Sounds good plan to me. > > > > BTW, I took a look at inputs/outputs UsageMasks and although we > > don't use them, I really think we really should, as having that > > info readily accessible would allow to avoid wasting > > time/bandwidth copying attributes which are not needed. > > Yes I'm already computing these masks in my TGSI parser, getting them > from higher up would be nicer and potentially more reliable. Good. Jose _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev