On 15.12.2011 21:10, Jose Fonseca wrote: > > ----- Original Message ----- >> >> ----- Original Message ----- >>> On 15.12.2011 20:09, Jose Fonseca wrote: >>>> ----- Original Message ----- >>>>> On 12/14/2011 12:58 AM, Ian Romanick wrote: >>>>>> On 12/13/2011 01:25 PM, Jose Fonseca wrote: >>>>>>> ----- Original Message ----- >>>>>>>> On 12/13/2011 03:09 PM, Jose Fonseca wrote: >>>>>>>>> ----- Original Message ----- >>>>>>>>>> On 12/13/2011 12:26 PM, Bryan Cain wrote: >>>>>>>>>>> On 12/13/2011 02:11 PM, Jose Fonseca wrote: >>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>> This is an updated version of the patch set I sent to >>>>>>>>>>>>> the >>>>>>>>>>>>> list >>>>>>>>>>>>> a >>>>>>>>>>>>> few >>>>>>>>>>>>> hours >>>>>>>>>>>>> ago. >>>>>>>>>>>>> There is now a TGSI property called >>>>>>>>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES >>>>>>>>>>>>> that drivers can use to determine how many of the 8 >>>>>>>>>>>>> available >>>>>>>>>>>>> clip >>>>>>>>>>>>> distances >>>>>>>>>>>>> are actually used by a shader. >>>>>>>>>>>> Can't the info in TGSI_PROPERTY_NUM_CLIP_DISTANCES be >>>>>>>>>>>> easily >>>>>>>>>>>> derived from the shader, and queried through >>>>>>>>>>>> src/gallium/auxiliary/tgsi/tgsi_scan.h ? >>>>>>>>>>> No. The clip distances can be indirectly addressed (there >>>>>>>>>>> are >>>>>>>>>>> up >>>>>>>>>>> to 2 >>>>>>>>>>> of them in vec4 form for a total of 8 floats), which makes >>>>>>>>>>> it >>>>>>>>>>> impossible >>>>>>>>>>> to determine which ones are used by analyzing the shader. >>>>>>>>>> The description is almost complete. :) The issue is that >>>>>>>>>> the >>>>>>>>>> shader >>>>>>>>>> may >>>>>>>>>> declare >>>>>>>>>> >>>>>>>>>> out float gl_ClipDistance[4]; >>>>>>>>>> >>>>>>>>>> the use non-constant addressing of the array. The compiler >>>>>>>>>> knows >>>>>>>>>> that >>>>>>>>>> gl_ClipDistance has at most 4 elements, but post-hoc >>>>>>>>>> analysis >>>>>>>>>> would >>>>>>>>>> not >>>>>>>>>> be able to determine that. Often the fixed-function >>>>>>>>>> hardware >>>>>>>>>> (see >>>>>>>>>> below) needs to know which clip distance values are >>>>>>>>>> actually >>>>>>>>>> written. >>>>>>>>> But don't all the clip distances written by the shader need >>>>>>>>> to >>>>>>>>> be >>>>>>>>> declared? >>>>>>>>> >>>>>>>>> E.g.: >>>>>>>>> >>>>>>>>> DCL OUT[0], CLIPDIST[0] >>>>>>>>> DCL OUT[1], CLIPDIST[1] >>>>>>>>> DCL OUT[2], CLIPDIST[2] >>>>>>>>> DCL OUT[3], CLIPDIST[3] >>>>>>>>> >>>>>>>>> therefore a trivial analysis of the declarations convey >>>>>>>>> that? >>>>>>>> No. Clip distance is an array of up to 8 floats in GLSL, but >>>>>>>> it's >>>>>>>> represented in the hardware as 2 vec4s. You can tell by >>>>>>>> analyzing >>>>>>>> the >>>>>>>> declarations whether there are more than 4 clip distances in >>>>>>>> use, >>>>>>>> but >>>>>>>> not which components the shader writes to. >>>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES is the number of components >>>>>>>> in >>>>>>>> use, >>>>>>>> not >>>>>>>> the number of full vectors. >>>>>>> Lets imagine >>>>>>> >>>>>>> out float gl_ClipDistance[6]; >>>>>>> >>>>>>> Each a clip distance is a scalar float. >>>>>>> >>>>>>> Either all hardware represents the 8 clip distances as two 4 >>>>>>> vectors, >>>>>>> and we do: >>>>>>> >>>>>>> DCL OUT[0].xywz, CLIPDIST[0] >>>>>>> DCL OUT[1].xy, CLIPDIST[1] >>>>>>> >>>>>>> using the full range of struct tgsi_declaration::UsageMask [1] >>>>>>> or >>>>>>> we >>>>>>> represent them as as scalars: >>>>>>> >>>>>>> DCL OUT[0].x, CLIPDIST[0] >>>>>>> DCL OUT[1].x, CLIPDIST[1] >>>>>>> DCL OUT[2].x, CLIPDIST[2] >>>>>>> DCL OUT[3].x, CLIPDIST[3] >>>>>>> DCL OUT[4].x, CLIPDIST[4] >>>>>>> DCL OUT[5].x, CLIPDIST[5] >>>>>>> >>>>>>> If indirect addressing is allowed as I read bore, then maybe >>>>>>> the >>>>>>> later >>>>>>> is better. >>>>>> As far as I'm aware, all hardware represents it as the former, >>>>>> and >>>>>> we >>>>>> have a lowering pass to fix-up the float[] accesses to be >>>>>> vec4[] >>>>>> accesses. >>>>> GeForce8+ = scalar architecture, no vectors, addresses are byte >>>>> based, >>>>> can access individual components just fine. >>>> Ok. So we should avoid baking this vec4 assumption in TGSI >>>> semantics. >>>> >>>>> Something like: >>>>> >>>>> gl_ClipDistance[i - 12] = some_value; >>>>> >>>>> DCL OUT[0].xyzw, POSITION >>>>> DCL OUT[1-8].x, CLIPDIST[0-7] >>>>> >>>>> MOV OUT<1>[ADDR[0].x - 12].x, TEMP[0].xxxx >>>>> * ** >>>>> >>>>> * - tgsi_dimension.Index specifying the base address by >>>>> referencing >>>>> a >>>>> declaration >>>>> ** - tgsi_src_register.Index >>>>> >>>>> is the only way I see to make this work nicely on all hardware. >>>>> (This is also needed if OUT[i] and OUT[i + 1] cannot be assigned >>>>> to >>>>> contiguous hardware resources because of semantic.) >>>> I think that having indexable temps, like D3D10, would be a >>>> cleaner: >>> The problem is that we need an indexable version of every file then >>> (at >>> least INPUT, OUTPUT), and then all the nice 32 bit structs break >>> down >>> when we get more than 16 files. >>> >>> D3D doesn't have these because indirect IN/OUT isn't allowed there, >>> but >>> it is in GL and the hardware can do it. >> Indirect IN/OUT is allowed on D3D9 , >> http://msdn.microsoft.com/en-us/library/windows/desktop/bb172963%28v=vs.85%29.aspx >> , but it looks like SM4 indeed doens't allow, >> >> http://msdn.microsoft.com/en-us/library/windows/desktop/ff471378%28v=VS.85%29.aspx >> , which means that indirect input needs spilling the inputs into a >> indexable temporary. >> >>> Also, having an indexable version of every file seems odd, >>> especially >>> since we need a way to distinguish individual arrays inside that >>> file >>> anyway (just SM4 uses 2 indices to access INDEXABLE_TEMP; for INPUT >>> we'll need 3 indices). >> Fair enough. >> >>>> DCL OUT[0].xyzw, POSITION >>>> DCL OUT[1][0-7].x, CLIPDIST[0-7] >>>> >>>> MOV OUT[1][ADDR[0].x - 12].x, TEMP[0].xxxx >>>> >>>> I propose we first add this new kind of temp at a first stage, >>>> then >>>> prohibit indirect addressing of all but this kind of temps. >>> There's already TEMPORARY_ARRAY, but no one wants to use it because >>> it's >>> not clear how to distinguish individual arrays ... >> There's a reference implementation in >> src/gallium/auxiliary/tgsi/tgsi_exec.c but I can't find any use of >> it outside. > The commit that introduced TEMPORARY_ARRAY has bit more detail: > > commit 101f792a2af9c9a19a050afba8b60caa689466a5 > Author: Zack Rusin <[email protected]> > Date: Fri Jun 18 13:41:20 2010 -0400 > > gallium: add a temporary array register file > > like normal temporaries, but allows to define a number of distinct > arrays, all of which make it explicit that they contain /indexable/ > registers. > as a side-effect we're adding support for multi-dimensional destination > registers. > The whole thing looks like this: > DCL TEMPX[0][0..128] # 0 array with 128 registers > > ADD TEMPX[0][0], IN[0], IMM[0] > ADD TEMPX[0][1], IN[0], IMM[0] > ABS OUT[0], TEMPX[0][TEMP[0]] > > But if we'd follow this route, it's still not clear what to do for > INPUT/OUTPUT/CNSTS, besides adding INPUTX,OUTPUTX,CONSTX? > > Another approach would be to make all registers bi-dimendional (as you were > hinting), but make the second index optional for single element arrays.
For compatibility, we could keep incrementing the primary (offset) index: DCL TEMP[0][0..7].xyzw DCL TEMP[1][8..11].x DCL TEMP[2][12..15].xy DCL IN[0][VERTEXCOUNT][0].xyzw, POSITION <- what I meant with 3 indices, GS or HS DCL IN[1][VERTEXCOUNT][1].xyzw, COLOR0 So that drivers unaware of the changes could keep allocating vec4s (right now an array of n indirectly accessed floats still has to be n vec4 TEMPs) for all the declarations and ignore the array index altogether. > Jose _______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
