On 4/27/2011 10:23 PM, Brian Paul wrote:
On Tue, Apr 26, 2011 at 12:26 AM, Bryan Cain<bryanca...@gmail.com>  wrote:
Hi,

In the last week or so, I've been working on a direct translator from
GLSL IR to TGSI that does not go through Mesa IR.  Although it is still
a work in progress, it is now working and very usable.  So before I go
on, here is a link to the branch I've pushed to GitHub:

https://github.com/Plombo/mesa/tree/glsl-130

My main objective with this work is to make GLSL 1.30 support feasible
on Gallium drivers.  From what I understand, it would be difficult or
impossible to implement integer-specific opcodes such as shifting and
bit masking in Mesa IR, since it only supports floats.  TGSI, on the
other hand, doesn't have this problem, and already supports most or all
of the functionality required by GLSL 1.30.
Unfortunately, TGSI doesn't have everything we need yet.  There's
opcodes for binary AND, OR, XOR, etc. and a few integer operations,
but it's incomplete.  It shouldn't be a big deal to add what's missing
but it'll take a little time.

I think everyone agrees that we want to eventually ditch Mesa's IR.  I
_think_ that the only classic Mesa driver that uses Mesa IR and hasn't
been deprecated by a Gallium driver, or already weaned from Mesa IR is
swrast.  How much does the i965 driver still rely on swrast for
fallbacks?  Do the Intel people see need for a GLSL IR executor for
swrast?

I must not have noticed the integer functionality missing from TGSI. I assume they're just the arithmetic opcodes?

The translator started as a modified version of ir_to_mesa, and that
origin is still obvious from reading the code.  Many parts of ir_to_mesa
are still untouched - glsl_to_tgsi is still a long way away from
eliminating all traces of Mesa IR.  It also contains a significant
amount of code adapted from st_mesa_to_tgsi, but modified to generate
TGSI code from the glsl_to_tgsi_instruction class instead of using Mesa
IR.  (It actually still generates Mesa IR instructions, but that could
be safely removed at some point since the generated Mesa IR instructions
are not actually used for anything.)  I'm planning to push more of the
conversion to TGSI higher up in the stack in the future, although the
remaining remnants of Mesa IR (such as the Mesa IR opcodes used by most
of glsl_to_tgsi) aren't doing any harm.
I finally found a little time to look over your code.  As you said,
it's basically a copy&  paste of the ir_to_mesa.cpp and
st_mesa_to_tgsi.c code at this time.  Do you plan to eliminate all
remnants of Mesa IR there before adding support for GLSL 1.30?  One
easy step would be to replace use of Mesa IR opcodes with TGSI opcodes
and add new TGSI opcodes for integer ops.

I do plan to eliminate the Mesa IR remnants, or the opcodes at the very least, before working on GLSL 1.30 support. The main reason I haven't replaced the Mesa IR opcodes yet is _mesa_num_src_regs and _mesa_num_dst_regs. Are there equivalents to these that work with TGSI opcodes?

Since the _mesa_optimize_program function is vital to generating
optimized code with ir_to_mesa, and it is not available when not using
Mesa IR, I've written some new optimization passes for
glsl_to_tgsi_visitor that perform dead code elimination and
consolidation of the temporary register space.  Although they are rather
simple, they do make a huge difference in the quality of the output.  As
an example, here is what it generates for the vertex shader in the
Mandelbrot GLSL demo from the Mesa demos repository:

VERT
DCL IN[0]
DCL IN[1]
DCL IN[2]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[10]
DCL OUT[2], GENERIC[11]
DCL CONST[0..14]
DCL TEMP[0..4]
IMM FLT32 {    2.0000,     0.0000,    -0.5000,     5.0000}
  0: MUL TEMP[0], CONST[4], IN[0].xxxx
  1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0]
  2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0]
  3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0]
  4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx
  5: MAD TEMP[1], CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz
  6: MAD TEMP[1], CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz
  7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz
  8: RSQ TEMP[2].x, TEMP[2].xxxx
  9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx
  10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz
  11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz
  12: RSQ TEMP[3].x, TEMP[3].xxxx
  13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx
  14: MOV TEMP[3].xyz, -TEMP[2].xyzx
  15: MOV TEMP[0].xyz, -TEMP[0].xyzx
  16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz
  17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz
  18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz
  19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz
  20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz
  21: RSQ TEMP[4].x, TEMP[4].xxxx
  22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx
  23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz
  24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy
  25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx
  26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz
  27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy
  28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx
  29: MAD TEMP[0], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx
  30: MOV OUT[2], TEMP[0].xxxx
  31: ADD TEMP[0], IN[2], IMM[0].zzzz
  32: MUL TEMP[0].xyz, TEMP[0].xyzz, IMM[0].wwww
  33: MOV OUT[1].xyz, TEMP[0].xyzx
  34: MUL TEMP[0], CONST[8], IN[0].xxxx
  35: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0]
  36: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0]
  37: MAD TEMP[0], CONST[11], IN[0].wwww, TEMP[0]
  38: MOV OUT[0], TEMP[0]
  39: END

Here is the same shader as generated by ir_to_mesa and st_mesa_to_tgsi
in Mesa master:

VERT
DCL IN[0]
DCL IN[1]
DCL IN[2]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[10]
DCL OUT[2], GENERIC[11]
DCL CONST[0..14]
DCL TEMP[0..4]
IMM FLT32 {    2.0000,     0.0000,    -0.5000,     5.0000}
  0: MUL TEMP[0], CONST[4], IN[0].xxxx
  1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0]
  2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0]
  3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0]
  4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx
  5: MAD TEMP[1].xyz, CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz
  6: MAD TEMP[1].xyz, CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz
  7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz
  8: RSQ TEMP[2].x, TEMP[2].xxxx
  9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx
  10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz
  11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz
  12: RSQ TEMP[3].x, TEMP[3].xxxx
  13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx
  14: MOV TEMP[3].xyz, -TEMP[2].xyzx
  15: MOV TEMP[0].xyz, -TEMP[0].xyzx
  16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz
  17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz
  18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz
  19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz
  20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz
  21: RSQ TEMP[4].x, TEMP[4].xxxx
  22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx
  23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz
  24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy
  25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx
  26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz
  27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy
  28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx
  29: MAD OUT[2], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx
  30: ADD TEMP[0], IN[2], IMM[0].zzzz
  31: MUL OUT[1].xyz, TEMP[0].xyzx, IMM[0].wwwx
  32: MUL TEMP[0], CONST[8], IN[0].xxxx
  33: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0]
  34: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0]
  35: MAD OUT[0], CONST[11], IN[0].wwww, TEMP[0]
  36: END

With neither the new optimization passes nor _mesa_optimize_program, the
shader has 44 instructions and 40 temporaries.  Both optimized shaders
have only 5 temporaries declared.  For every shader I've tried, in fact,
my register consolidation passes result in exactly the same number of
temporaries being used as when _mesa_optimize_program is used.  In terms
of instruction count, the only optimization visible that is implemented
in Mesa master but not in the GLSL IR to TGSI converter is copy
propagation to output registers, which accounts for 2 of the 3 extra
instructions in the st_glsl_to_tgsi version of the shader.

One current weakness of my new optimization passes is that they don't
optimize code inside of loops as well as they should, although at least
they don't break code that uses loops to the best of my knowledge and
testing.

I'd very much appreciate any comments, feedback, patches, or testing.
I don't have any spare time to test anything right now.  The only
feedback I have for now would be superficial (whitespace
inconsistencies, comments, etc).  But I'm glad you're taking on this
project.

-Brian

Okay, thanks.

Bryan

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to