On 16.12.2011 19:27, Ian Romanick wrote: > On 12/13/2011 05:08 PM, Christoph Bumiller wrote: >> On 12/14/2011 12:58 AM, Ian Romanick wrote: >>> On 12/13/2011 01:25 PM, Jose Fonseca wrote: >>>> >>>> >>>> ----- Original Message ----- >>>>> On 12/13/2011 03:09 PM, Jose Fonseca wrote: >>>>>> >>>>>> ----- Original Message ----- >>>>>>> On 12/13/2011 12:26 PM, Bryan Cain wrote: >>>>>>>> On 12/13/2011 02:11 PM, Jose Fonseca wrote: >>>>>>>>> ----- Original Message ----- >>>>>>>>>> This is an updated version of the patch set I sent to the list >>>>>>>>>> a >>>>>>>>>> few >>>>>>>>>> hours >>>>>>>>>> ago. >>>>>>>>>> There is now a TGSI property called >>>>>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES >>>>>>>>>> that drivers can use to determine how many of the 8 available >>>>>>>>>> clip >>>>>>>>>> distances >>>>>>>>>> are actually used by a shader. >>>>>>>>> Can't the info in TGSI_PROPERTY_NUM_CLIP_DISTANCES be easily >>>>>>>>> derived from the shader, and queried through >>>>>>>>> src/gallium/auxiliary/tgsi/tgsi_scan.h ? >>>>>>>> No. The clip distances can be indirectly addressed (there are up >>>>>>>> to 2 >>>>>>>> of them in vec4 form for a total of 8 floats), which makes it >>>>>>>> impossible >>>>>>>> to determine which ones are used by analyzing the shader. >>>>>>> The description is almost complete. :) The issue is that the >>>>>>> shader >>>>>>> may >>>>>>> declare >>>>>>> >>>>>>> out float gl_ClipDistance[4]; >>>>>>> >>>>>>> the use non-constant addressing of the array. The compiler knows >>>>>>> that >>>>>>> gl_ClipDistance has at most 4 elements, but post-hoc analysis >>>>>>> would >>>>>>> not >>>>>>> be able to determine that. Often the fixed-function hardware (see >>>>>>> below) needs to know which clip distance values are actually >>>>>>> written. >>>>>> But don't all the clip distances written by the shader need to be >>>>>> declared? >>>>>> >>>>>> E.g.: >>>>>> >>>>>> DCL OUT[0], CLIPDIST[0] >>>>>> DCL OUT[1], CLIPDIST[1] >>>>>> DCL OUT[2], CLIPDIST[2] >>>>>> DCL OUT[3], CLIPDIST[3] >>>>>> >>>>>> therefore a trivial analysis of the declarations convey that? >>>>> >>>>> No. Clip distance is an array of up to 8 floats in GLSL, but it's >>>>> represented in the hardware as 2 vec4s. You can tell by analyzing >>>>> the >>>>> declarations whether there are more than 4 clip distances in use, but >>>>> not which components the shader writes to. >>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES is the number of components in use, >>>>> not >>>>> the number of full vectors. >>>> >>>> Lets imagine >>>> >>>> out float gl_ClipDistance[6]; >>>> >>>> Each a clip distance is a scalar float. >>>> >>>> Either all hardware represents the 8 clip distances as two 4 vectors, >>>> and we do: >>>> >>>> DCL OUT[0].xywz, CLIPDIST[0] >>>> DCL OUT[1].xy, CLIPDIST[1] >>>> >>>> using the full range of struct tgsi_declaration::UsageMask [1] or we >>>> represent them as as scalars: >>>> >>>> DCL OUT[0].x, CLIPDIST[0] >>>> DCL OUT[1].x, CLIPDIST[1] >>>> DCL OUT[2].x, CLIPDIST[2] >>>> DCL OUT[3].x, CLIPDIST[3] >>>> DCL OUT[4].x, CLIPDIST[4] >>>> DCL OUT[5].x, CLIPDIST[5] >>>> >>>> If indirect addressing is allowed as I read bore, then maybe the later >>>> is better. >>> >>> As far as I'm aware, all hardware represents it as the former, and we >>> have a lowering pass to fix-up the float[] accesses to be vec4[] >>> accesses. >> >> GeForce8+ = scalar architecture, no vectors, addresses are byte based, >> can access individual components just fine. >> >> Something like: >> >> gl_ClipDistance[i - 12] = some_value; >> >> DCL OUT[0].xyzw, POSITION >> DCL OUT[1-8].x, CLIPDIST[0-7] >> >> MOV OUT<1>[ADDR[0].x - 12].x, TEMP[0].xxxx >> * ** >> >> * - tgsi_dimension.Index specifying the base address by referencing a >> declaration >> ** - tgsi_src_register.Index >> >> is the only way I see to make this work nicely on all hardware. >> >> (This is also needed if OUT[i] and OUT[i + 1] cannot be assigned to >> contiguous hardware resources because of semantic.) >> >> For constrained hardware the driver can build the clunky >> >> c := ADDR[0].x % 4 >> i := ADDR[0].x / 4 >> IF [c == 0] >> MOV OUT[i].x, TEMP[0].xxxx >> ELSE >> IF [c == 1] >> MOV OUT[i].y, TEMP[0].xxxx >> ELSE >> IF [c == 2] >> MOV OUT[i].z, TEMP[0].xxxx >> ELSE >> MOV OUT[i].w, TEMP[0].xxxx >> ENDIF >> >> itself. > > Doing it at that low-level has a number of significant drawbacks. The > worst is that it's long after any high-level optimizations can be done > on the code. It also means that it has to be reimplemented in every > driver that needs. This really belongs at a higher level in the code. > > Note that lowering pass that already exists changes the accesses to > 'float gl_ClipDistance[8]' to 'vec4 gl_ClipDistanceMESA[2]'. Is there > a compelling reason to not do the same at the lower level?
Of course, we can add a CAP/option to let the driver choose whether it wants a TGSI array or some pass at a higher level to lower the assignment. I'd just like TGSI to be extended to be able to express what's *actually* going on so I can produce my simple: shl $r0, constbuf0[0], 2 store out[$r0+0x270], $r1 (0x270-0x28c are fixed locations for clip distances) for gl_ClipDistance[uniform int i] = some_value; _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev