On Thu, Apr 28, 2011 at 5:23 AM, Brian Paul <brian.e.p...@gmail.com> wrote:
> On Tue, Apr 26, 2011 at 12:26 AM, Bryan Cain <bryanca...@gmail.com> wrote: > > Hi, > > > > In the last week or so, I've been working on a direct translator from > > GLSL IR to TGSI that does not go through Mesa IR. Although it is still > > a work in progress, it is now working and very usable. So before I go > > on, here is a link to the branch I've pushed to GitHub: > > > > https://github.com/Plombo/mesa/tree/glsl-130 > > > > My main objective with this work is to make GLSL 1.30 support feasible > > on Gallium drivers. From what I understand, it would be difficult or > > impossible to implement integer-specific opcodes such as shifting and > > bit masking in Mesa IR, since it only supports floats. TGSI, on the > > other hand, doesn't have this problem, and already supports most or all > > of the functionality required by GLSL 1.30. > > Unfortunately, TGSI doesn't have everything we need yet. There's > opcodes for binary AND, OR, XOR, etc. and a few integer operations, > but it's incomplete. It shouldn't be a big deal to add what's missing > but it'll take a little time. > > I think everyone agrees that we want to eventually ditch Mesa's IR. I > _think_ that the only classic Mesa driver that uses Mesa IR and hasn't > been deprecated by a Gallium driver, or already weaned from Mesa IR is > swrast. How much does the i965 driver still rely on swrast for > fallbacks? Do the Intel people see need for a GLSL IR executor for > swrast? > > > > The translator started as a modified version of ir_to_mesa, and that > > origin is still obvious from reading the code. Many parts of ir_to_mesa > > are still untouched - glsl_to_tgsi is still a long way away from > > eliminating all traces of Mesa IR. It also contains a significant > > amount of code adapted from st_mesa_to_tgsi, but modified to generate > > TGSI code from the glsl_to_tgsi_instruction class instead of using Mesa > > IR. (It actually still generates Mesa IR instructions, but that could > > be safely removed at some point since the generated Mesa IR instructions > > are not actually used for anything.) I'm planning to push more of the > > conversion to TGSI higher up in the stack in the future, although the > > remaining remnants of Mesa IR (such as the Mesa IR opcodes used by most > > of glsl_to_tgsi) aren't doing any harm. > > I finally found a little time to look over your code. As you said, > it's basically a copy & paste of the ir_to_mesa.cpp and > st_mesa_to_tgsi.c code at this time. Do you plan to eliminate all > remnants of Mesa IR there before adding support for GLSL 1.30? One > easy step would be to replace use of Mesa IR opcodes with TGSI opcodes > and add new TGSI opcodes for integer ops. > > > > Since the _mesa_optimize_program function is vital to generating > > optimized code with ir_to_mesa, and it is not available when not using > > Mesa IR, I've written some new optimization passes for > > glsl_to_tgsi_visitor that perform dead code elimination and > > consolidation of the temporary register space. Although they are rather > > simple, they do make a huge difference in the quality of the output. As > > an example, here is what it generates for the vertex shader in the > > Mandelbrot GLSL demo from the Mesa demos repository: > > > > VERT > > DCL IN[0] > > DCL IN[1] > > DCL IN[2] > > DCL OUT[0], POSITION > > DCL OUT[1], GENERIC[10] > > DCL OUT[2], GENERIC[11] > > DCL CONST[0..14] > > DCL TEMP[0..4] > > IMM FLT32 { 2.0000, 0.0000, -0.5000, 5.0000} > > 0: MUL TEMP[0], CONST[4], IN[0].xxxx > > 1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0] > > 2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0] > > 3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0] > > 4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx > > 5: MAD TEMP[1], CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz > > 6: MAD TEMP[1], CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz > > 7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz > > 8: RSQ TEMP[2].x, TEMP[2].xxxx > > 9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx > > 10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz > > 11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz > > 12: RSQ TEMP[3].x, TEMP[3].xxxx > > 13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx > > 14: MOV TEMP[3].xyz, -TEMP[2].xyzx > > 15: MOV TEMP[0].xyz, -TEMP[0].xyzx > > 16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz > > 17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz > > 18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz > > 19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz > > 20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz > > 21: RSQ TEMP[4].x, TEMP[4].xxxx > > 22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx > > 23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz > > 24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy > > 25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx > > 26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz > > 27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy > > 28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx > > 29: MAD TEMP[0], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx > > 30: MOV OUT[2], TEMP[0].xxxx > > 31: ADD TEMP[0], IN[2], IMM[0].zzzz > > 32: MUL TEMP[0].xyz, TEMP[0].xyzz, IMM[0].wwww > > 33: MOV OUT[1].xyz, TEMP[0].xyzx > > 34: MUL TEMP[0], CONST[8], IN[0].xxxx > > 35: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0] > > 36: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0] > > 37: MAD TEMP[0], CONST[11], IN[0].wwww, TEMP[0] > > 38: MOV OUT[0], TEMP[0] > > 39: END > > > > Here is the same shader as generated by ir_to_mesa and st_mesa_to_tgsi > > in Mesa master: > > > > VERT > > DCL IN[0] > > DCL IN[1] > > DCL IN[2] > > DCL OUT[0], POSITION > > DCL OUT[1], GENERIC[10] > > DCL OUT[2], GENERIC[11] > > DCL CONST[0..14] > > DCL TEMP[0..4] > > IMM FLT32 { 2.0000, 0.0000, -0.5000, 5.0000} > > 0: MUL TEMP[0], CONST[4], IN[0].xxxx > > 1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0] > > 2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0] > > 3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0] > > 4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx > > 5: MAD TEMP[1].xyz, CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz > > 6: MAD TEMP[1].xyz, CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz > > 7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz > > 8: RSQ TEMP[2].x, TEMP[2].xxxx > > 9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx > > 10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz > > 11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz > > 12: RSQ TEMP[3].x, TEMP[3].xxxx > > 13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx > > 14: MOV TEMP[3].xyz, -TEMP[2].xyzx > > 15: MOV TEMP[0].xyz, -TEMP[0].xyzx > > 16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz > > 17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz > > 18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz > > 19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz > > 20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz > > 21: RSQ TEMP[4].x, TEMP[4].xxxx > > 22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx > > 23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz > > 24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy > > 25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx > > 26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz > > 27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy > > 28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx > > 29: MAD OUT[2], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx > > 30: ADD TEMP[0], IN[2], IMM[0].zzzz > > 31: MUL OUT[1].xyz, TEMP[0].xyzx, IMM[0].wwwx > > 32: MUL TEMP[0], CONST[8], IN[0].xxxx > > 33: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0] > > 34: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0] > > 35: MAD OUT[0], CONST[11], IN[0].wwww, TEMP[0] > > 36: END > > > > With neither the new optimization passes nor _mesa_optimize_program, the > > shader has 44 instructions and 40 temporaries. Both optimized shaders > > have only 5 temporaries declared. For every shader I've tried, in fact, > > my register consolidation passes result in exactly the same number of > > temporaries being used as when _mesa_optimize_program is used. In terms > > of instruction count, the only optimization visible that is implemented > > in Mesa master but not in the GLSL IR to TGSI converter is copy > > propagation to output registers, which accounts for 2 of the 3 extra > > instructions in the st_glsl_to_tgsi version of the shader. > > > > One current weakness of my new optimization passes is that they don't > > optimize code inside of loops as well as they should, although at least > > they don't break code that uses loops to the best of my knowledge and > > testing. > > > > I'd very much appreciate any comments, feedback, patches, or testing. > > I don't have any spare time to test anything right now. The only > feedback I have for now would be superficial (whitespace > inconsistencies, comments, etc). But I'm glad you're taking on this > project. > FWIW, In order to keep all the other drivers working and especially those which can't support integer opcodes, there should be a way for a driver to report that it doesn't accept those opcodes and glsl_to_tgsi shouldn't generate them then. The cap could be e.g. PIPE_CAP_SM4 or PIPE_CAP_SHADER_MODEL returning a number >=4. Marek
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev