----- Original Message ----- > > > ----- Original Message ----- > > On 15.12.2011 20:09, Jose Fonseca wrote: > > > ----- Original Message ----- > > >> On 12/14/2011 12:58 AM, Ian Romanick wrote: > > >>> On 12/13/2011 01:25 PM, Jose Fonseca wrote: > > >>>> > > >>>> ----- Original Message ----- > > >>>>> On 12/13/2011 03:09 PM, Jose Fonseca wrote: > > >>>>>> ----- Original Message ----- > > >>>>>>> On 12/13/2011 12:26 PM, Bryan Cain wrote: > > >>>>>>>> On 12/13/2011 02:11 PM, Jose Fonseca wrote: > > >>>>>>>>> ----- Original Message ----- > > >>>>>>>>>> This is an updated version of the patch set I sent to > > >>>>>>>>>> the > > >>>>>>>>>> list > > >>>>>>>>>> a > > >>>>>>>>>> few > > >>>>>>>>>> hours > > >>>>>>>>>> ago. > > >>>>>>>>>> There is now a TGSI property called > > >>>>>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES > > >>>>>>>>>> that drivers can use to determine how many of the 8 > > >>>>>>>>>> available > > >>>>>>>>>> clip > > >>>>>>>>>> distances > > >>>>>>>>>> are actually used by a shader. > > >>>>>>>>> Can't the info in TGSI_PROPERTY_NUM_CLIP_DISTANCES be > > >>>>>>>>> easily > > >>>>>>>>> derived from the shader, and queried through > > >>>>>>>>> src/gallium/auxiliary/tgsi/tgsi_scan.h ? > > >>>>>>>> No. The clip distances can be indirectly addressed (there > > >>>>>>>> are > > >>>>>>>> up > > >>>>>>>> to 2 > > >>>>>>>> of them in vec4 form for a total of 8 floats), which makes > > >>>>>>>> it > > >>>>>>>> impossible > > >>>>>>>> to determine which ones are used by analyzing the shader. > > >>>>>>> The description is almost complete. :) The issue is that > > >>>>>>> the > > >>>>>>> shader > > >>>>>>> may > > >>>>>>> declare > > >>>>>>> > > >>>>>>> out float gl_ClipDistance[4]; > > >>>>>>> > > >>>>>>> the use non-constant addressing of the array. The compiler > > >>>>>>> knows > > >>>>>>> that > > >>>>>>> gl_ClipDistance has at most 4 elements, but post-hoc > > >>>>>>> analysis > > >>>>>>> would > > >>>>>>> not > > >>>>>>> be able to determine that. Often the fixed-function > > >>>>>>> hardware > > >>>>>>> (see > > >>>>>>> below) needs to know which clip distance values are > > >>>>>>> actually > > >>>>>>> written. > > >>>>>> But don't all the clip distances written by the shader need > > >>>>>> to > > >>>>>> be > > >>>>>> declared? > > >>>>>> > > >>>>>> E.g.: > > >>>>>> > > >>>>>> DCL OUT[0], CLIPDIST[0] > > >>>>>> DCL OUT[1], CLIPDIST[1] > > >>>>>> DCL OUT[2], CLIPDIST[2] > > >>>>>> DCL OUT[3], CLIPDIST[3] > > >>>>>> > > >>>>>> therefore a trivial analysis of the declarations convey > > >>>>>> that? > > >>>>> No. Clip distance is an array of up to 8 floats in GLSL, but > > >>>>> it's > > >>>>> represented in the hardware as 2 vec4s. You can tell by > > >>>>> analyzing > > >>>>> the > > >>>>> declarations whether there are more than 4 clip distances in > > >>>>> use, > > >>>>> but > > >>>>> not which components the shader writes to. > > >>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES is the number of components > > >>>>> in > > >>>>> use, > > >>>>> not > > >>>>> the number of full vectors. > > >>>> Lets imagine > > >>>> > > >>>> out float gl_ClipDistance[6]; > > >>>> > > >>>> Each a clip distance is a scalar float. > > >>>> > > >>>> Either all hardware represents the 8 clip distances as two 4 > > >>>> vectors, > > >>>> and we do: > > >>>> > > >>>> DCL OUT[0].xywz, CLIPDIST[0] > > >>>> DCL OUT[1].xy, CLIPDIST[1] > > >>>> > > >>>> using the full range of struct tgsi_declaration::UsageMask [1] > > >>>> or > > >>>> we > > >>>> represent them as as scalars: > > >>>> > > >>>> DCL OUT[0].x, CLIPDIST[0] > > >>>> DCL OUT[1].x, CLIPDIST[1] > > >>>> DCL OUT[2].x, CLIPDIST[2] > > >>>> DCL OUT[3].x, CLIPDIST[3] > > >>>> DCL OUT[4].x, CLIPDIST[4] > > >>>> DCL OUT[5].x, CLIPDIST[5] > > >>>> > > >>>> If indirect addressing is allowed as I read bore, then maybe > > >>>> the > > >>>> later > > >>>> is better. > > >>> As far as I'm aware, all hardware represents it as the former, > > >>> and > > >>> we > > >>> have a lowering pass to fix-up the float[] accesses to be > > >>> vec4[] > > >>> accesses. > > >> GeForce8+ = scalar architecture, no vectors, addresses are byte > > >> based, > > >> can access individual components just fine. > > > Ok. So we should avoid baking this vec4 assumption in TGSI > > > semantics. > > > > > >> Something like: > > >> > > >> gl_ClipDistance[i - 12] = some_value; > > >> > > >> DCL OUT[0].xyzw, POSITION > > >> DCL OUT[1-8].x, CLIPDIST[0-7] > > >> > > >> MOV OUT<1>[ADDR[0].x - 12].x, TEMP[0].xxxx > > >> * ** > > >> > > >> * - tgsi_dimension.Index specifying the base address by > > >> referencing > > >> a > > >> declaration > > >> ** - tgsi_src_register.Index > > >> > > >> is the only way I see to make this work nicely on all hardware. > > >> (This is also needed if OUT[i] and OUT[i + 1] cannot be assigned > > >> to > > >> contiguous hardware resources because of semantic.) > > > I think that having indexable temps, like D3D10, would be a > > > cleaner: > > > > The problem is that we need an indexable version of every file then > > (at > > least INPUT, OUTPUT), and then all the nice 32 bit structs break > > down > > when we get more than 16 files. > > > > D3D doesn't have these because indirect IN/OUT isn't allowed there, > > but > > it is in GL and the hardware can do it. > > Indirect IN/OUT is allowed on D3D9 , > http://msdn.microsoft.com/en-us/library/windows/desktop/bb172963%28v=vs.85%29.aspx > , but it looks like SM4 indeed doens't allow, > > http://msdn.microsoft.com/en-us/library/windows/desktop/ff471378%28v=VS.85%29.aspx > , which means that indirect input needs spilling the inputs into a > indexable temporary. > > > Also, having an indexable version of every file seems odd, > > especially > > since we need a way to distinguish individual arrays inside that > > file > > anyway (just SM4 uses 2 indices to access INDEXABLE_TEMP; for INPUT > > we'll need 3 indices). > > Fair enough. > > > > DCL OUT[0].xyzw, POSITION > > > DCL OUT[1][0-7].x, CLIPDIST[0-7] > > > > > > MOV OUT[1][ADDR[0].x - 12].x, TEMP[0].xxxx > > > > > > I propose we first add this new kind of temp at a first stage, > > > then > > > prohibit indirect addressing of all but this kind of temps. > > > > There's already TEMPORARY_ARRAY, but no one wants to use it because > > it's > > not clear how to distinguish individual arrays ... > > There's a reference implementation in > src/gallium/auxiliary/tgsi/tgsi_exec.c but I can't find any use of > it outside.
The commit that introduced TEMPORARY_ARRAY has bit more detail: commit 101f792a2af9c9a19a050afba8b60caa689466a5 Author: Zack Rusin <za...@vmware.com> Date: Fri Jun 18 13:41:20 2010 -0400 gallium: add a temporary array register file like normal temporaries, but allows to define a number of distinct arrays, all of which make it explicit that they contain /indexable/ registers. as a side-effect we're adding support for multi-dimensional destination registers. The whole thing looks like this: DCL TEMPX[0][0..128] # 0 array with 128 registers ADD TEMPX[0][0], IN[0], IMM[0] ADD TEMPX[0][1], IN[0], IMM[0] ABS OUT[0], TEMPX[0][TEMP[0]] But if we'd follow this route, it's still not clear what to do for INPUT/OUTPUT/CNSTS, besides adding INPUTX,OUTPUTX,CONSTX? Another approach would be to make all registers bi-dimendional (as you were hinting), but make the second index optional for single element arrays. Jose _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev