On 07/02/2014 04:31 PM, Ian Romanick wrote:
On 07/02/2014 07:29 AM, Brian Paul wrote:
On 07/02/2014 06:49 AM, Brian Paul wrote:
On 07/02/2014 06:36 AM, Brian Paul wrote:
On 07/01/2014 08:43 PM, Michel Dänzer wrote:
On 02.07.2014 00:43, Brian Paul wrote:
Module: Mesa
Branch: master
Commit: f4b0ab7afd83c811329211eae8167c9bf238870c
URL:
https://urldefense.proofpoint.com/v1/url?u=http://cgit.freedesktop.org/mesa/mesa/commit/?id%3Df4b0ab7afd83c811329211eae8167c9bf238870c&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=lGQMzzTgII0I7jefp2FHq7WtZ%2BTLs8wadB%2BiIj9xpBY%3D%0A&m=X5dPXUSLXb%2B60RW5%2FHCgcIkPXz083MgfZAbiRJFVOxc%3D%0A&s=c9766b9e3797ccad30888c534fae5891f9a7313418a228c81c0528230412cf1c




Author: Brian Paul <bri...@vmware.com>
Date:   Tue Jul  1 08:17:09 2014 -0600

st/mesa: fix incorrect size of UBO declarations

UniformBufferSize is in bytes so we need to divide by 16 to get the
number of constant buffer slots.  Also, the ureg_DECL_constant2D()
function takes first..last parameters so we need to subtract one
for the last value.

This change broke the GLSL uniform_buffer fs/vs/gs-struct-pad piglit
tests with radeonsi:

../../../../src/gallium/auxiliary/gallivm/lp_bld_tgsi.c:306:lp_build_emit_fetch:


Assertion `reg->Register.Index <=
bld_base->info->file_max[reg->Register.File]' failed.

AFAICT reg->Register.Index is 2, and the new code calls
ureg_DECL_constant2D() with last=1. This is the TGSI code:

FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL OUT[0], COLOR
DCL CONST[1][0..1]
DCL TEMP[0], LOCAL
IMM[0] UINT32 {0, 16, 32, 0}
    0: MOV TEMP[0].x, CONST[1][0].xxxx
    1: MOV TEMP[0].y, CONST[1][1].xxxx
    2: MOV TEMP[0].z, CONST[1][2].xxxx
    3: MOV TEMP[0].w, CONST[1][2].xxxx
    4: MOV OUT[0], TEMP[0]
    5: END

which results from this GLSL code:

#version 140

struct S1 {
          float r;
};

struct S2 {
          float g;
          float b;
          float a;
};

struct S {
         S1 s1;
         S2 s2;
};

uniform ubo1 {
          S s;
};

void main()
{
          gl_FragColor = vec4(s.s1.r, s.s2.g, s.s2.b, s.s2.a);
}


Something doesn't seem to add up, but I'm not sure where exactly the
problem is or how to fix it.


I think there's at least a one GLSL compiler bug exposed here.

  From the code fragments above, the z and w components of TEMP[0] are
getting the same value when they should not.

I reverted Mesa to before my changes and modified the piglit test to use
different values for the blue/alpha channels and it fails (nvidia
passes).  We were just getting lucky before my changes.

I suspect the code which is laying out the floats in the UBO is doing
something wrong, and the size of the UBO is miscalculated as a result.

I'll dig a bit more but I think someone more familiar with the compiler
will need to help.

The initial IR looks OK:


(loop ...

          (assign  (xyzw) (var_ref gl_Position)  (array_ref (var_ref
vertex_to_gs) (var_ref i) ) )
          (declare (temporary ) vec4 vec_ctor)
          (assign  (x) (var_ref vec_ctor)  (record_ref (record_ref
(var_ref s)  s1)  r) )
          (assign  (y) (var_ref vec_ctor)  (record_ref (record_ref
(var_ref s)  s2)  g) )
          (assign  (z) (var_ref vec_ctor)  (record_ref (record_ref
(var_ref s)  s2)  b) )
          (assign  (w) (var_ref vec_ctor)  (record_ref (record_ref
(var_ref s)  s2)  a) )
          (assign  (xyzw) (var_ref v)  (var_ref vec_ctor) )
          (call EmitVertex  ())

... endloop)


But later, after linking, the Z and W components of the vector
constructor seem to get merged.  Note: there's only 3 ubo_load
instructions now instead of four:

...
        (assign  (xyzw) (var_ref gl_Position)  (array_ref (var_ref
vertex_to_gs) (constant int (0)) ) )
        (declare (temporary ) vec4 vec_ctor)
        (declare () float cse)
        (assign  (x) (var_ref cse)  (expression float ubo_load (constant
uint (0)) (constant uint (0)) ) )
        (assign  (x) (var_ref vec_ctor)  (var_ref cse) )
        (declare () float cse@3)
        (assign  (x) (var_ref cse@3)  (expression float ubo_load
(constant uint (0)) (constant uint (16)) ) )
        (assign  (y) (var_ref vec_ctor)  (var_ref cse@3) )
        (declare () float cse@4)
        (assign  (x) (var_ref cse@4)  (expression float ubo_load
(constant uint (0)) (constant uint (32)) ) )
        (assign  (z) (var_ref vec_ctor)  (var_ref cse@4) )
        (assign  (w) (var_ref vec_ctor)  (var_ref cse@4) )
        (assign  (xyzw) (var_ref v)  (var_ref vec_ctor) )
        (assign  (xyzw) (var_ref v@5)  (var_ref v) )
        (assign  (xyzw) (var_ref gl_Position@6)  (var_ref gl_Position) )
        (emit-vertex (constant int (0)) )

I'm not sure I can debug this much further right now.

I think the problem is in lower_ubo_reference_visitor::handle_rvalue()
in lower_ubo_reference.cpp

If I disable the "const_offset = glsl_align(const_offset,
max_field_align);" near line 231 things seem to work.

Something is wrong in the logic that does struct/field alignment in that
function.

Oof.  There are a number of bugs in this area.  I've been looking at one
for the last few days, and I've just about got it resolved.  Basically.

     uniform U {
         layout(row_major) mat3x4 m;
         vec4 v;
     } u;

doesn't layout u.m as row-major.  This only occurs if there is an
instance name. :(

I can dig into this bug once I'm done with the matrix bug.

Eric wrote the original code, maybe he can take a look?

I've also been reading the GL_ARB_uniform_buffer_object spec to
understand the rules of layout "std140".  For a structure it says "the
base alignmentof the structure is <N>, where <N> is the largest base
alignment value of any of its members, and rounded up to the base
alignment of a vec4."

So it sounds like all structs should start at 16-byte boundaries.  So
for the piglit test in question, I'd expect the UBO block to be 32
bytes.  But both Mesa and nvidia compute the UBO size to be 16 bytes.

Actually, I had a mistake there. Mesa reports 32 while nvidia reports 16 bytes.


Hmm... do both drivers give 32 for

     struct S1 { float f; };
     struct S2 { vec4 v; };

     uniform U {
         S1 s1;
         S2 s2;
     };

Mesa: 32
nvidia: 32



Also, the ES3.1 and GL4.4 specs have some changes to the std140 rules to
make them more clear.  Do those specs agree with the extension spec?

I don't have time to dig into that right now.

-Brian

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to