Re: [Mesa-dev] [PATCH] glsl_to_tgsi: indirect array information

Christian König Wed, 23 Jan 2013 02:31:38 -0800

On 23.01.2013 03:18, Tom Stellard wrote:

On Wed, Jan 23, 2013 at 02:20:21AM +0100, Christoph Bumiller wrote:

On 23.01.2013 02:07, Vadim Girlin wrote:

On 01/23/2013 04:42 AM, Christoph Bumiller wrote:

On 23.01.2013 01:21, Vadim Girlin wrote:

On 01/23/2013 03:59 AM, Vincent Lejeune wrote:


----- Mail original -----

De : Vadim Girlin <vadimgir...@gmail.com>
À : Christoph Bumiller <e0425...@student.tuwien.ac.at>
Cc : mesa-dev@lists.freedesktop.org
Envoyé le : Mercredi 23 janvier 2013 0h44
Objet : Re: [Mesa-dev] [PATCH] glsl_to_tgsi: indirect array
information

On 01/22/2013 10:59 PM, Christoph Bumiller wrote:

    On 21.01.2013 21:10, Vadim Girlin wrote:

    Provide the information about indirectly addressable arrays
(ranges of

temps) in

    the shader to the drivers. TGSI representation itself isn't

modified, array

    information is passed as an additional data in the
pipe_shader_state,

so the

    drivers can use it as a hint for optimization.
    ---

    It's far from being an ideal solution, but I saw the discussions

about that

    problem starting from 2009 IIRC, and we still have no solution
(neither

good

    nor bad) despite the years passed. I hope we can use this not
very

intrusive

    approach until we get something better.

    I'd rather not have any hacks in the interface, let alone ones
that
    solve the problem only partially (you still won't know which
array is
    accessed by a particular instruction, which is important for
    optimization and essential in some cases for making INPUT/OUTPUT
arrays
    work), and not just because it reduces the pressure on people to
    implement a proper solution.

    With this, you just get to know which range of TEMPs are
indirectly
    addressed and which ones are not, and you can do the same by
simply
    creating multiple declarations of TEMPs, one for each array, and
adding
    a single bit of info to tgsi_declaration (which has 7 bits of
padding
    anyway, so ample space), which is a lot less ugly, and doesn't
suffer
    from an arbitrary limit, and doesn't require any modification of

drivers

    either.

Array accessed by any indirect operand can be identified by the
immediate offset, e.g. TEMP[ADDR[0].x+1] implies the array starting
from
1, thus we can find it's entry in the information provided by this
patch
to get the addressable range for every indirect operand. If I'm not
missing something, glsl_to_tgsi accumulates all other parts of the
offset in the address register before the indirect access. If I'm
wrong,
we can fix it to ensure such behavior.

I'm not sure about that ; when I worked on indirect addressing of
const memory,
I discovered when tracking vp/fo regression that the immediate offset
is the result of
    glsl_to_tgsi constant propagation and not related to the underlying
array.
This means that the dynamic index can be negative, which is not always
desirable depending on the hw. (In R600 case, const fetch instruction
does not
support negative index. MOVA inst does).

For instance, the following pseudo code snippet is fine for an index
value of -4 :

uniform int index;

float array[4];
float data = array[6 + index];

and is lowered to
MOV TEMP[0] TEMP[ADDR[0].x + 6];

I tried the following shader:

uniform int index;

void main()
{
      float array[4] = float[4](0.1, 0.2, 0.3, 0.4);
      float data = array[6 + index];
      gl_FragColor = vec4(data, 1.0, 0.0, 1.0);
}

Resulting TGSI:

--------------------------------------------------------------
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL OUT[0], COLOR
DCL CONST[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
DCL TEMP[2], LOCAL
DCL TEMP[3], LOCAL
DCL TEMP[4], LOCAL
DCL TEMP[5], LOCAL
DCL TEMP[6], LOCAL
DCL TEMP[7], LOCAL
DCL ADDR[0]
IMM[0] FLT32 {    0.1000,     0.2000,     0.3000,     0.4000}
IMM[1] FLT32 {    1.0000,     0.0000,     0.0000,     0.0000}
IMM[2] INT32 {6, 0, 0, 0}
    0: MOV TEMP[1].yzw, IMM[1].yxyx
    1: MOV TEMP[2], IMM[0].xxxx
    2: MOV TEMP[3], IMM[0].yyyy
    3: MOV TEMP[4], IMM[0].zzzz
    4: MOV TEMP[5], IMM[0].wwww
    5: UADD TEMP[6].x, IMM[2].xxxx, CONST[0].xxxx
    6: UARL ADDR[0].x, TEMP[6].xxxx
    7: MOV TEMP[1].x, TEMP[ADDR[0].x+2].xxxx
    8: MOV_SAT OUT[0], TEMP[1]
    9: END
--------------------------------------------------------------

Also I tried the following:

uniform float array[4];
uniform int index;

void main()
{
      float data = array[6 + index];
      gl_FragColor = vec4(data, 1.0, 0.0, 1.0);
}

Resulting TGSI:

--------------------------------------------------------------
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL OUT[0], COLOR
DCL CONST[0..4]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
DCL ADDR[0]
IMM[0] FLT32 {    1.0000,     0.0000,     0.0000,     0.0000}
IMM[1] INT32 {6, 0, 0, 0}
    0: MOV TEMP[0].yzw, IMM[0].yxyx
    1: UADD TEMP[1].x, IMM[1].xxxx, CONST[0].xxxx
    2: UARL ADDR[0].x, TEMP[1].xxxx
    3: MOV TEMP[0].x, CONST[ADDR[0].x+1].xxxx
    4: MOV_SAT OUT[0], TEMP[0]
    5: END
--------------------------------------------------------------

So far immediate offset in the indirect operand is always equal to the
start offset of the array. Could you provide some more complete example
that demonstrates the problem, please.

Vadim

Not really, because shaders like

float array[8];

uniform int pos;

void main()
{
     array[0] = 1.0;
     array[1] = 2.0;
     array[2] = 3.0;
     array[3] = 4.0;
     gl_FragColor = vec4(array[pos - 16],
                         array[pos - 17],
                 array[pos - 18],
                 array[pos - 19]);
}

yield the terribly unoptimized

    0: MOV TEMP[1].x, IMM[0].xxxx
    1: MOV TEMP[2].x, IMM[0].yyyy
    2: MOV TEMP[3].x, IMM[0].zzzz
    3: MOV TEMP[4].x, IMM[0].wwww
    4: UADD TEMP[9].x, CONST[0].xxxx, IMM[1].xxxx
    5: UARL ADDR[0].x, TEMP[9].xxxx
    6: MOV TEMP[10].x, TEMP[ADDR[0].x+1].xxxx
    7: UADD TEMP[11].x, CONST[0].xxxx, IMM[1].yyyy
    8: UARL ADDR[0].x, TEMP[11].xxxx
    9: MOV TEMP[10].y, TEMP[ADDR[0].x+1].xxxx
   10: UADD TEMP[12].x, CONST[0].xxxx, IMM[1].zzzz
   11: UARL ADDR[0].x, TEMP[12].xxxx
   12: MOV TEMP[10].z, TEMP[ADDR[0].x+1].xxxx
   13: UADD TEMP[13].x, CONST[0].xxxx, IMM[1].wwww
   14: UARL ADDR[0].x, TEMP[13].xxxx
   15: MOV TEMP[10].w, TEMP[ADDR[0].x+1].xxxx
   16: MOV OUT[0], TEMP[10]
   17: END

instead of simply adjusting the offset and NOT emitting tons of ARLs.
But this is NOT guaranteed behaviour and neither should it be (I did
suggest that in the past, but some people disagreed and they convinced
me).

I agree that it's terribly unoptimized, but it doesn't change the fact
that currently we can use immediate offset to match it to the array
info. Is anybody going to optimize this tomorrow to break this patch?

If you intend to mandate this behaviour so that drivers can rely on it,
you forgot to place a big fat warning somewhere shader authors can see
it. Having this kind of behaviour and in addition leaving it
undocumented is unacceptable.

We can discuss it forever though and probably it doesn't makes sense.
This discussion won't be any different from the previous discussions of
that problem - the result will be same - we'll have no working solution.

I prefer no solution at all over a bad one that makes it easier for
people to ignore the issue and thus increase the probability of this
never being fixed.

Indirect TEMPs aren't that common, you'll survive until there is a poper
solution.
(And even when they're present, a lot of the direct TEMP accesses will
likely be saved by a pass that promotes memory access to registers.)

Doesn't anyone work on gallium+compiler full time anymore who can do
this properly ?

I think it would be helpful if we could agree on a solution and document
that decision so that someone who was interested in implementing this
correctly would know what to do and not have to worry about wasting time
on something that wouldn't be accepted.

So far we have 4 proposed solutions:

1. Pass additional array information in pipe_shader_state

    http://lists.freedesktop.org/archives/mesa-dev/2013-January/033111.html

2. Store all values that may be accessed by indirect addressing in
    the TGSI_FILE_TEMPORARY_ARRAY file.

    http://lists.freedesktop.org/archives/mesa-dev/2012-November/029575.html

3. Use temporary register range declaration to distinguish between array
    objects and then identify the objects using a constant indirect
    offset.

    http://lists.freedesktop.org/archives/mesa-dev/2012-November/029764.html

4. Clearly define arrays in the TGSI declarations with a numeric
    identifier.

    http://lists.freedesktop.org/archives/mesa-dev/2012-November/030476.html

It doesn't seem like we can come to a consensus on any one
implementation, so let's indicate our preferences like this:

+ Identify your preferred solution
+ Identify all solutions that you would be acceptable to you, even if you
   think they have some flaws.

Maybe this will help us choose a solution (If it doesn't then we
can just ask Brian to decide).

For me, I prefer solution #4 even though it is the most work, but
solution #3 would be acceptable to me.

What does everyone else think?

Definitely solution #4. Solution #3 would only be acceptable if we hadsome time pressure, but I don't see that right now.

Also I don't think solution #4 would be so much work to do, we only needto emit each array separately (something which the available patchesshould already do), and then store the resulting index into"DimensionIndex".

That behavior should of course be driver specific until all driversunderstand that new form of addressing.


Christian.


-Tom
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl_to_tgsi: indirect array information

Reply via email to