Since the _mesa_optimize_program function is vital to generating
optimized code with ir_to_mesa, and it is not available when not using
Mesa IR, I've written some new optimization passes for
glsl_to_tgsi_visitor that perform dead code elimination and
consolidation of the temporary register space. Although they are rather
simple, they do make a huge difference in the quality of the output. As
an example, here is what it generates for the vertex shader in the
Mandelbrot GLSL demo from the Mesa demos repository:
VERT
DCL IN[0]
DCL IN[1]
DCL IN[2]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[10]
DCL OUT[2], GENERIC[11]
DCL CONST[0..14]
DCL TEMP[0..4]
IMM FLT32 { 2.0000, 0.0000, -0.5000, 5.0000}
0: MUL TEMP[0], CONST[4], IN[0].xxxx
1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0]
2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0]
3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0]
4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx
5: MAD TEMP[1], CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz
6: MAD TEMP[1], CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz
7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz
8: RSQ TEMP[2].x, TEMP[2].xxxx
9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx
10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz
11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz
12: RSQ TEMP[3].x, TEMP[3].xxxx
13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx
14: MOV TEMP[3].xyz, -TEMP[2].xyzx
15: MOV TEMP[0].xyz, -TEMP[0].xyzx
16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz
17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz
18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz
19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz
20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz
21: RSQ TEMP[4].x, TEMP[4].xxxx
22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx
23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz
24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy
25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx
26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz
27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy
28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx
29: MAD TEMP[0], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx
30: MOV OUT[2], TEMP[0].xxxx
31: ADD TEMP[0], IN[2], IMM[0].zzzz
32: MUL TEMP[0].xyz, TEMP[0].xyzz, IMM[0].wwww
33: MOV OUT[1].xyz, TEMP[0].xyzx
34: MUL TEMP[0], CONST[8], IN[0].xxxx
35: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0]
36: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0]
37: MAD TEMP[0], CONST[11], IN[0].wwww, TEMP[0]
38: MOV OUT[0], TEMP[0]
39: END
Here is the same shader as generated by ir_to_mesa and st_mesa_to_tgsi
in Mesa master:
VERT
DCL IN[0]
DCL IN[1]
DCL IN[2]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[10]
DCL OUT[2], GENERIC[11]
DCL CONST[0..14]
DCL TEMP[0..4]
IMM FLT32 { 2.0000, 0.0000, -0.5000, 5.0000}
0: MUL TEMP[0], CONST[4], IN[0].xxxx
1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0]
2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0]
3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0]
4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx
5: MAD TEMP[1].xyz, CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz
6: MAD TEMP[1].xyz, CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz
7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz
8: RSQ TEMP[2].x, TEMP[2].xxxx
9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx
10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz
11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz
12: RSQ TEMP[3].x, TEMP[3].xxxx
13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx
14: MOV TEMP[3].xyz, -TEMP[2].xyzx
15: MOV TEMP[0].xyz, -TEMP[0].xyzx
16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz
17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz
18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz
19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz
20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz
21: RSQ TEMP[4].x, TEMP[4].xxxx
22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx
23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz
24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy
25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx
26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz
27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy
28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx
29: MAD OUT[2], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx
30: ADD TEMP[0], IN[2], IMM[0].zzzz
31: MUL OUT[1].xyz, TEMP[0].xyzx, IMM[0].wwwx
32: MUL TEMP[0], CONST[8], IN[0].xxxx
33: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0]
34: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0]
35: MAD OUT[0], CONST[11], IN[0].wwww, TEMP[0]
36: END
With neither the new optimization passes nor _mesa_optimize_program, the
shader has 44 instructions and 40 temporaries. Both optimized shaders
have only 5 temporaries declared. For every shader I've tried, in fact,
my register consolidation passes result in exactly the same number of
temporaries being used as when _mesa_optimize_program is used. In terms
of instruction count, the only optimization visible that is implemented
in Mesa master but not in the GLSL IR to TGSI converter is copy
propagation to output registers, which accounts for 2 of the 3 extra
instructions in the st_glsl_to_tgsi version of the shader.
One current weakness of my new optimization passes is that they don't
optimize code inside of loops as well as they should, although at least
they don't break code that uses loops to the best of my knowledge and
testing.
I'd very much appreciate any comments, feedback, patches, or testing.