Hi, In the last week or so, I've been working on a direct translator from GLSL IR to TGSI that does not go through Mesa IR. Although it is still a work in progress, it is now working and very usable. So before I go on, here is a link to the branch I've pushed to GitHub:
https://github.com/Plombo/mesa/tree/glsl-130 My main objective with this work is to make GLSL 1.30 support feasible on Gallium drivers. From what I understand, it would be difficult or impossible to implement integer-specific opcodes such as shifting and bit masking in Mesa IR, since it only supports floats. TGSI, on the other hand, doesn't have this problem, and already supports most or all of the functionality required by GLSL 1.30. The translator started as a modified version of ir_to_mesa, and that origin is still obvious from reading the code. Many parts of ir_to_mesa are still untouched - glsl_to_tgsi is still a long way away from eliminating all traces of Mesa IR. It also contains a significant amount of code adapted from st_mesa_to_tgsi, but modified to generate TGSI code from the glsl_to_tgsi_instruction class instead of using Mesa IR. (It actually still generates Mesa IR instructions, but that could be safely removed at some point since the generated Mesa IR instructions are not actually used for anything.) I'm planning to push more of the conversion to TGSI higher up in the stack in the future, although the remaining remnants of Mesa IR (such as the Mesa IR opcodes used by most of glsl_to_tgsi) aren't doing any harm. Since the _mesa_optimize_program function is vital to generating optimized code with ir_to_mesa, and it is not available when not using Mesa IR, I've written some new optimization passes for glsl_to_tgsi_visitor that perform dead code elimination and consolidation of the temporary register space. Although they are rather simple, they do make a huge difference in the quality of the output. As an example, here is what it generates for the vertex shader in the Mandelbrot GLSL demo from the Mesa demos repository: VERT DCL IN[0] DCL IN[1] DCL IN[2] DCL OUT[0], POSITION DCL OUT[1], GENERIC[10] DCL OUT[2], GENERIC[11] DCL CONST[0..14] DCL TEMP[0..4] IMM FLT32 { 2.0000, 0.0000, -0.5000, 5.0000} 0: MUL TEMP[0], CONST[4], IN[0].xxxx 1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0] 2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0] 3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0] 4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx 5: MAD TEMP[1], CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz 6: MAD TEMP[1], CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz 7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz 8: RSQ TEMP[2].x, TEMP[2].xxxx 9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx 10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz 11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz 12: RSQ TEMP[3].x, TEMP[3].xxxx 13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx 14: MOV TEMP[3].xyz, -TEMP[2].xyzx 15: MOV TEMP[0].xyz, -TEMP[0].xyzx 16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz 17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz 18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz 19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz 20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz 21: RSQ TEMP[4].x, TEMP[4].xxxx 22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx 23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz 24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy 25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx 26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz 27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy 28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx 29: MAD TEMP[0], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx 30: MOV OUT[2], TEMP[0].xxxx 31: ADD TEMP[0], IN[2], IMM[0].zzzz 32: MUL TEMP[0].xyz, TEMP[0].xyzz, IMM[0].wwww 33: MOV OUT[1].xyz, TEMP[0].xyzx 34: MUL TEMP[0], CONST[8], IN[0].xxxx 35: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0] 36: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0] 37: MAD TEMP[0], CONST[11], IN[0].wwww, TEMP[0] 38: MOV OUT[0], TEMP[0] 39: END Here is the same shader as generated by ir_to_mesa and st_mesa_to_tgsi in Mesa master: VERT DCL IN[0] DCL IN[1] DCL IN[2] DCL OUT[0], POSITION DCL OUT[1], GENERIC[10] DCL OUT[2], GENERIC[11] DCL CONST[0..14] DCL TEMP[0..4] IMM FLT32 { 2.0000, 0.0000, -0.5000, 5.0000} 0: MUL TEMP[0], CONST[4], IN[0].xxxx 1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0] 2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0] 3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0] 4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx 5: MAD TEMP[1].xyz, CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz 6: MAD TEMP[1].xyz, CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz 7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz 8: RSQ TEMP[2].x, TEMP[2].xxxx 9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx 10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz 11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz 12: RSQ TEMP[3].x, TEMP[3].xxxx 13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx 14: MOV TEMP[3].xyz, -TEMP[2].xyzx 15: MOV TEMP[0].xyz, -TEMP[0].xyzx 16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz 17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz 18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz 19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz 20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz 21: RSQ TEMP[4].x, TEMP[4].xxxx 22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx 23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz 24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy 25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx 26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz 27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy 28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx 29: MAD OUT[2], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx 30: ADD TEMP[0], IN[2], IMM[0].zzzz 31: MUL OUT[1].xyz, TEMP[0].xyzx, IMM[0].wwwx 32: MUL TEMP[0], CONST[8], IN[0].xxxx 33: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0] 34: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0] 35: MAD OUT[0], CONST[11], IN[0].wwww, TEMP[0] 36: END With neither the new optimization passes nor _mesa_optimize_program, the shader has 44 instructions and 40 temporaries. Both optimized shaders have only 5 temporaries declared. For every shader I've tried, in fact, my register consolidation passes result in exactly the same number of temporaries being used as when _mesa_optimize_program is used. In terms of instruction count, the only optimization visible that is implemented in Mesa master but not in the GLSL IR to TGSI converter is copy propagation to output registers, which accounts for 2 of the 3 extra instructions in the st_glsl_to_tgsi version of the shader. One current weakness of my new optimization passes is that they don't optimize code inside of loops as well as they should, although at least they don't break code that uses loops to the best of my knowledge and testing. I'd very much appreciate any comments, feedback, patches, or testing. Regards, Bryan _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev