On 01/23/2013 04:42 AM, Christoph Bumiller wrote:
On 23.01.2013 01:21, Vadim Girlin wrote:
On 01/23/2013 03:59 AM, Vincent Lejeune wrote:


----- Mail original -----
De : Vadim Girlin <vadimgir...@gmail.com>
À : Christoph Bumiller <e0425...@student.tuwien.ac.at>
Cc : mesa-dev@lists.freedesktop.org
Envoyé le : Mercredi 23 janvier 2013 0h44
Objet : Re: [Mesa-dev] [PATCH] glsl_to_tgsi: indirect array information

On 01/22/2013 10:59 PM, Christoph Bumiller wrote:
   On 21.01.2013 21:10, Vadim Girlin wrote:
   Provide the information about indirectly addressable arrays
(ranges of
temps) in
   the shader to the drivers. TGSI representation itself isn't
modified, array
   information is passed as an additional data in the
pipe_shader_state,
so the
   drivers can use it as a hint for optimization.
   ---

   It's far from being an ideal solution, but I saw the discussions
about that
   problem starting from 2009 IIRC, and we still have no solution
(neither
good
   nor bad) despite the years passed. I hope we can use this not very
intrusive
   approach until we get something better.


   I'd rather not have any hacks in the interface, let alone ones that
   solve the problem only partially (you still won't know which array is
   accessed by a particular instruction, which is important for
   optimization and essential in some cases for making INPUT/OUTPUT
arrays
   work), and not just because it reduces the pressure on people to
   implement a proper solution.

   With this, you just get to know which range of TEMPs are indirectly
   addressed and which ones are not, and you can do the same by simply
   creating multiple declarations of TEMPs, one for each array, and
adding
   a single bit of info to tgsi_declaration (which has 7 bits of padding
   anyway, so ample space), which is a lot less ugly, and doesn't suffer
   from an arbitrary limit, and doesn't require any modification of
drivers
   either.


Array accessed by any indirect operand can be identified by the
immediate offset, e.g. TEMP[ADDR[0].x+1] implies the array starting from
1, thus we can find it's entry in the information provided by this patch
to get the addressable range for every indirect operand. If I'm not
missing something, glsl_to_tgsi accumulates all other parts of the
offset in the address register before the indirect access. If I'm wrong,
we can fix it to ensure such behavior.

I'm not sure about that ; when I worked on indirect addressing of
const memory,
I discovered when tracking vp/fo regression that the immediate offset
is the result of
   glsl_to_tgsi constant propagation and not related to the underlying
array.
This means that the dynamic index can be negative, which is not always
desirable depending on the hw. (In R600 case, const fetch instruction
does not
support negative index. MOVA inst does).

For instance, the following pseudo code snippet is fine for an index
value of -4 :

uniform int index;

float array[4];
float data = array[6 + index];

and is lowered to
MOV TEMP[0] TEMP[ADDR[0].x + 6];


I tried the following shader:

uniform int index;

void main()
{
     float array[4] = float[4](0.1, 0.2, 0.3, 0.4);
     float data = array[6 + index];
     gl_FragColor = vec4(data, 1.0, 0.0, 1.0);
}

Resulting TGSI:

--------------------------------------------------------------
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL OUT[0], COLOR
DCL CONST[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
DCL TEMP[2], LOCAL
DCL TEMP[3], LOCAL
DCL TEMP[4], LOCAL
DCL TEMP[5], LOCAL
DCL TEMP[6], LOCAL
DCL TEMP[7], LOCAL
DCL ADDR[0]
IMM[0] FLT32 {    0.1000,     0.2000,     0.3000,     0.4000}
IMM[1] FLT32 {    1.0000,     0.0000,     0.0000,     0.0000}
IMM[2] INT32 {6, 0, 0, 0}
   0: MOV TEMP[1].yzw, IMM[1].yxyx
   1: MOV TEMP[2], IMM[0].xxxx
   2: MOV TEMP[3], IMM[0].yyyy
   3: MOV TEMP[4], IMM[0].zzzz
   4: MOV TEMP[5], IMM[0].wwww
   5: UADD TEMP[6].x, IMM[2].xxxx, CONST[0].xxxx
   6: UARL ADDR[0].x, TEMP[6].xxxx
   7: MOV TEMP[1].x, TEMP[ADDR[0].x+2].xxxx
   8: MOV_SAT OUT[0], TEMP[1]
   9: END
--------------------------------------------------------------

Also I tried the following:

uniform float array[4];
uniform int index;

void main()
{
     float data = array[6 + index];
     gl_FragColor = vec4(data, 1.0, 0.0, 1.0);
}

Resulting TGSI:

--------------------------------------------------------------
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL OUT[0], COLOR
DCL CONST[0..4]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
DCL ADDR[0]
IMM[0] FLT32 {    1.0000,     0.0000,     0.0000,     0.0000}
IMM[1] INT32 {6, 0, 0, 0}
   0: MOV TEMP[0].yzw, IMM[0].yxyx
   1: UADD TEMP[1].x, IMM[1].xxxx, CONST[0].xxxx
   2: UARL ADDR[0].x, TEMP[1].xxxx
   3: MOV TEMP[0].x, CONST[ADDR[0].x+1].xxxx
   4: MOV_SAT OUT[0], TEMP[0]
   5: END
--------------------------------------------------------------

So far immediate offset in the indirect operand is always equal to the
start offset of the array. Could you provide some more complete example
that demonstrates the problem, please.

Vadim


Not really, because shaders like

float array[8];

uniform int pos;

void main()
{
        array[0] = 1.0;
        array[1] = 2.0;
        array[2] = 3.0;
        array[3] = 4.0;
        gl_FragColor = vec4(array[pos - 16],
                            array[pos - 17],
                            array[pos - 18],
                            array[pos - 19]);
}

yield the terribly unoptimized

   0: MOV TEMP[1].x, IMM[0].xxxx
   1: MOV TEMP[2].x, IMM[0].yyyy
   2: MOV TEMP[3].x, IMM[0].zzzz
   3: MOV TEMP[4].x, IMM[0].wwww
   4: UADD TEMP[9].x, CONST[0].xxxx, IMM[1].xxxx
   5: UARL ADDR[0].x, TEMP[9].xxxx
   6: MOV TEMP[10].x, TEMP[ADDR[0].x+1].xxxx
   7: UADD TEMP[11].x, CONST[0].xxxx, IMM[1].yyyy
   8: UARL ADDR[0].x, TEMP[11].xxxx
   9: MOV TEMP[10].y, TEMP[ADDR[0].x+1].xxxx
  10: UADD TEMP[12].x, CONST[0].xxxx, IMM[1].zzzz
  11: UARL ADDR[0].x, TEMP[12].xxxx
  12: MOV TEMP[10].z, TEMP[ADDR[0].x+1].xxxx
  13: UADD TEMP[13].x, CONST[0].xxxx, IMM[1].wwww
  14: UARL ADDR[0].x, TEMP[13].xxxx
  15: MOV TEMP[10].w, TEMP[ADDR[0].x+1].xxxx
  16: MOV OUT[0], TEMP[10]
  17: END

instead of simply adjusting the offset and NOT emitting tons of ARLs.
But this is NOT guaranteed behaviour and neither should it be (I did
suggest that in the past, but some people disagreed and they convinced me).


I agree that it's terribly unoptimized, but it doesn't change the fact that currently we can use immediate offset to match it to the array info. Is anybody going to optimize this tomorrow to break this patch?

We can discuss it forever though and probably it doesn't makes sense. This discussion won't be any different from the previous discussions of that problem - the result will be same - we'll have no working solution.

Vadim

Also, mesa is not the only state tracker out there so try not to rely on
it's special perks too much.

I didn't test your patch atm, but I think you may have to fix
glsl_to_tgsi.
Otherwise I'm in favor of implementing something not optimal but far
better that what we have currently.

Vincent


I'll be perfectly OK with any other solution, as long as it's a really
working (already implemented) solution that I can use today, not just
some abstract ideas in the discussions. This patch isn't perfect and can
be improved, but it already works for me. I'll be very happy to use any
other solution from you or anyone else.

Vadim




_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to