Re: [Mesa-dev] [RFC] GL fixed function fragment shaders

Jakob Bornecrantz Fri, 18 Mar 2011 13:32:23 -0700

On Mon, Jan 17, 2011 at 10:40 PM, Eric Anholt <e...@anholt.net> wrote:
> On Thu, 13 Jan 2011 17:40:39 +0100, Roland Scheidegger <srol...@vmware.com> 
> wrote:
>> Am 12.01.2011 23:04, schrieb Eric Anholt:
>> > This is a work-in-progress patch series to switch texenvprogram.c from
>> > generating ARB_fp style Mesa IR to generating GLSL IR as its product.
>> > For drivers without native GLSL codegen, that is then turned into the
>> > Mesa IR that can be consumed.  However, for 965 we don't use the Mesa
>> > IR product and just use the GLSL output, producing much better code
>> > thanks to the new backend.  This is part of a long term goal to get
>> > Mesa drivers off of Mesa IR and producing their instruction stream
>> > directly from the GLSL IR.
>> >
>> > I'm not planning on committing this series immediately, as I've still
>> > got a regression in the 965 driver with texrect-many on the last
>> > commit.
>> >
>> > As a comparison, here's one of the shaders from openarena before:
>>
>> So what's the code looking like after conversion to mesa IR? As long
>> as
>


[SNIP]

>
> So, there's one extra Mesa IR move added where we could compute into the
> destination reg but don't.  This is a general problem with
> ir_to_mesa.cpp that affects GLSL pretty badly.

I found pretty much the same thing when looking into tunnel:

# Fragment Program/Shader 0
 0: TXP TEMP[0], INPUT[4].xyyw, texture[0], 2D;
 1: MUL TEMP[1].xyz, TEMP[0], INPUT[1];
 2: MOV TEMP[0].xyz, TEMP[1].xyzx;
 3: MOV TEMP[0].w, INPUT[1].wwww;
 4: MOV TEMP[2], TEMP[0];
 5: MUL TEMP[0].x, INPUT[3].xxxx, STATE[1].wwww;
 6: MUL TEMP[3].x, TEMP[0].xxxx, TEMP[0].xxxx;
 7: EX2 TEMP[0].x, TEMP[3].-x-x-x-x;
 8: MOV_SAT TEMP[3].x, TEMP[0].xxxx;
 9: ADD TEMP[0].x, CONST[4].xxxx, TEMP[3].-x-x-x-x;
10: MUL TEMP[4].xyz, STATE[2].xyzz, TEMP[0].xxxx;
11: MAD TEMP[2].xyz, TEMP[1].xyzx, TEMP[3].xxxx, TEMP[4].xyzx;
12: MOV OUTPUT[2], TEMP[2];
13: END

# Fragment Program/Shader 0
 0: TXP TEMP[0], INPUT[4], texture[0], 2D;
 1: MUL_SAT TEMP[1].xyz, TEMP[0], INPUT[1];
 2: MOV_SAT TEMP[1].w, INPUT[1];
 3: MUL TEMP[2].x, STATE[0].wwww, INPUT[3].xxxx;
 4: MUL TEMP[2].x, TEMP[2].xxxx, TEMP[2].xxxx;
 5: EX2_SAT TEMP[2].x, TEMP[2].-x-x-x-x;
 6: LRP OUTPUT[2].xyz, TEMP[2].xxxx, TEMP[1], STATE[1];
 7: MOV OUTPUT[2].w, TEMP[1];
 8: END

I got similar results, tho the effects are more visible here. Also
note that the new shader uses 5 temps compared to 3. The FF setup I
think only uses fog (or one texenv modulate) so its not just hard to
program texenv that gets effect by this change.

Now looking at how this is generated, the new code seems to generate
it quite similarly to the old. After that tho things gets interesting,
after the generation step the old code is now done and is on the
already optimized form you see above. The new code however is far from
done. Going through it first go through various common GLSL IR
optimizations steps (from the attached text file, the second shader
and third shader in the file both are the same just with and without
the inlining of GLSL IR). Finally it calls _mesa_optimize_program
which gets it to its current form.

As for the code itself, it doesn't look as bad as I thought it would,
there are a lot of allocations, a fair bit of extra typing tho loc
count in the commit stays about the same even less, the reason behind
that is that texenv has its own implementation of ureg. Not counting
that a conversion to GLSL IR would instead add extra locs.

>
> Of course, talking about optimality of Mesa IR is kind of a joke, as for
> the drivers that directly consume it (i915, 965 VS, r200, and I'm
> discounting r300+ as they have their own IR that Mesa IR gets translated
> to and actually optimized), we miss huge opportunities to reduce
> instruction count due to swizzle sources including -1, 0, 1 as options
> but Mesa IR not taking advantage of it.  If we were doing that right,
> then the other MOV-reduction pass would hit and that extra move just
> added here would go away, resulting in a net win.

This could be done with any of the IR's (provided numeric swizzling is
added) and something that I have been thinking about adding to TGSI.
As pretty much all hw supports it natively (exception being svga).

>
> Similarly, we add an extra indirection phase according to 915's
> accounting of those on the second shader, but the fact that we don't
> schedule those in our GLSL output anyway is a big issue for GLSL on
> hardware with indirection limits.
>
>> it's not worse than the original I guess this should be ok, though for
>> those drivers consuming mesa IR I guess it's just more cpu time without
>> any real benefit?
>
> Assuming that the setup the app did was already optimal for a
> programmable GPU, yes.  But I suspect that isn't generally the case --
> while OA has reasonable looking fixed function setup (other than Mesa IR
> we produce not using the swizzles), given how painful it is to program
> using texenv I suspect there are a lot of "suboptimal" shader setups out
> there that we could actually improve.

You posted some GLSL IR cpu optimizations patches after pushing this
code and only the delta between pre and post optimizations. What is
the delta for the old MesaIR code and GLSL IR code, if you didn't do
any testing can you give an estimate? We seem to be doing a lot more
cpu crunching for worse results.

>> For gallium we should probably address this some way
>> or another, it seems quite backward to do ff->glsl->mesa ir->tgsi.
>
> I'm surprised you guys haven't forked off ir_to_mesa.cpp to something
> that produces TGSI, since you seem to prefer it as the thing for drivers
> to consume over GLSL IR.  At least with sized variables, you could then
> adapt the Mesa IR optimization passes on TGSI so that they wouldn't all
> be disabled whenever relative addressing occurred.  I'm only interested
> in Mesa IR for hardware that doesn't have relative addressing of temps,
> so it's not really an issue to me.

While a ir_to_tgsi is needed, I'm a quite worried that the old
_mesa_optimize_program was needed at all to even get it close to
comparable output.

Cheers Jakob.

GLSL IR for linked fragment program 0:
(
(declare (uniform ) sampler2D sampler_0@0x23c7fd0)
(declare (out ) vec4 gl_FragColor@0x23cbde0)
(declare (in ) vec4 gl_Color@0x23cbf00)
(declare (in ) float gl_FogFragCoord@0x23cc020)
(declare (uniform ) vec4 gl_MESAFogParamsOptimized@0x23ccaa0)
(declare (uniform ) gl_FogParameters gl_Fog@0x23ce9f0)
(declare (in ) (array vec4 1) gl_TexCoord@0x23cea80)
(function main
  (signature void
    (parameters
    )
    (
      (declare (temporary ) vec4 texenv_combine@0x23ceeb0)
      (assign  (xyz) (var_ref texenv_combine@0x23ceeb0)  (swiz xyz (expression 
vec4 * (tex (var_ref sampler_0@0x23c7fd0)  (swiz xy (array_ref (var_ref 
gl_TexCoord@0x23cea80) (constant uint (0)) ) ) 0 (swiz w (array_ref (var_ref 
gl_TexCoord@0x23cea80) (constant uint (0)) ) ) () )(var_ref gl_Color@0x23cbf00) 
) )) 
      (assign  (w) (var_ref texenv_combine@0x23ceeb0)  (swiz w (var_ref 
gl_Color@0x23cbf00) )) 
      (declare () vec4 fog_result@0x23cf2f0)
      (assign  (xyzw) (var_ref fog_result@0x23cf2f0)  (var_ref 
texenv_combine@0x23ceeb0) ) 
      (declare () float fog_factor@0x23cf400)
      (declare () float fog_temp@0x23cf490)
      (assign  (x) (var_ref fog_temp@0x23cf490)  (expression float * (var_ref 
gl_FogFragCoord@0x23cc020) (swiz w (var_ref 
gl_MESAFogParamsOptimized@0x23ccaa0) )) ) 
      (assign  (x) (var_ref fog_factor@0x23cf400)  (expression float max 
(expression float min (expression float exp2 (expression float neg (expression 
float * (var_ref fog_temp@0x23cf490) (var_ref fog_temp@0x23cf490) ) ) ) 
(constant float (1.000000)) ) (constant float (0.000000)) ) ) 
      (assign  (xyz) (var_ref fog_result@0x23cf2f0)  (expression vec3 + 
(expression vec3 * (swiz xyz (record_ref (var_ref gl_Fog@0x23ce9f0)  color) 
)(expression float + (constant float (1.000000)) (expression float neg (var_ref 
fog_factor@0x23cf400) ) ) ) (expression vec3 * (swiz xyz (var_ref 
texenv_combine@0x23ceeb0) )(var_ref fog_factor@0x23cf400) ) ) ) 
      (assign  (xyzw) (var_ref gl_FragColor@0x23cbde0)  (var_ref 
fog_result@0x23cf2f0) ) 
    ))

)


)

Mesa IR for linked fragment program 0:
  0: (declare (uniform ) gl_FogParameters gl_Fog@0x23ce9f0)
     MOV TEMP[1], STATE[2];
  1: MOV TEMP[2], STATE[3].xxxx;
  2: MOV TEMP[3], STATE[3].yyyy;
  3: MOV TEMP[4], STATE[3].zzzz;
  4: MOV TEMP[5], STATE[3].wwww;
  5: (tex (var_ref sampler_0@0x23c7fd0)  (swiz xy (array_ref (var_ref 
gl_TexCoord@0x23cea80) (constant uint (0)) ) ) 0 (swiz w (array_ref (var_ref 
gl_TexCoord@0x23cea80) (constant uint (0)) ) ) () )
     MOV TEMP[6], INPUT[4].xyyy;
  6: MOV TEMP[6].w, INPUT[4].wwww;
  7: TXP TEMP[7], INPUT[4].xyyw, texture[0], 2D;
  8: (expression vec4 * (tex (var_ref sampler_0@0x23c7fd0)  (swiz xy (array_ref 
(var_ref gl_TexCoord@0x23cea80) (constant uint (0)) ) ) 0 (swiz w (array_ref 
(var_ref gl_TexCoord@0x23cea80) (constant uint (0)) ) ) () )(var_ref 
gl_Color@0x23cbf00) ) 
     MUL TEMP[8], TEMP[7], INPUT[1];
  9: (assign  (xyz) (var_ref texenv_combine@0x23ceeb0)  (swiz xyz (expression 
vec4 * (tex (var_ref sampler_0@0x23c7fd0)  (swiz xy (array_ref (var_ref 
gl_TexCoord@0x23cea80) (constant uint (0)) ) ) 0 (swiz w (array_ref (var_ref 
gl_TexCoord@0x23cea80) (constant uint (0)) ) ) () )(var_ref gl_Color@0x23cbf00) 
) )) 
     MOV TEMP[9].xyz, TEMP[8].xyzx;
 10: (assign  (w) (var_ref texenv_combine@0x23ceeb0)  (swiz w (var_ref 
gl_Color@0x23cbf00) )) 
     MOV TEMP[9].w, INPUT[1].wwww;
 11: (assign  (xyzw) (var_ref fog_result@0x23cf2f0)  (var_ref 
texenv_combine@0x23ceeb0) ) 
     MOV TEMP[10], TEMP[9];
 12: (expression float * (var_ref gl_FogFragCoord@0x23cc020) (swiz w (var_ref 
gl_MESAFogParamsOptimized@0x23ccaa0) )) 
     MUL TEMP[11].x, INPUT[3].xxxx, STATE[1].wwww;
 13: (assign  (x) (var_ref fog_temp@0x23cf490)  (expression float * (var_ref 
gl_FogFragCoord@0x23cc020) (swiz w (var_ref 
gl_MESAFogParamsOptimized@0x23ccaa0) )) ) 
     MOV TEMP[12], TEMP[11].xxxx;
 14: (expression float * (var_ref fog_temp@0x23cf490) (var_ref 
fog_temp@0x23cf490) ) 
     MUL TEMP[13].x, TEMP[11].xxxx, TEMP[11].xxxx;
 15: (expression float exp2 (expression float neg (expression float * (var_ref 
fog_temp@0x23cf490) (var_ref fog_temp@0x23cf490) ) ) ) 
     EX2 TEMP[15].x, TEMP[13].-x-x-x-x;
 16: (expression float max (expression float min (expression float exp2 
(expression float neg (expression float * (var_ref fog_temp@0x23cf490) (var_ref 
fog_temp@0x23cf490) ) ) ) (constant float (1.000000)) ) (constant float 
(0.000000)) ) 
     MOV_SAT TEMP[16], TEMP[15].xxxx;
 17: (assign  (x) (var_ref fog_factor@0x23cf400)  (expression float max 
(expression float min (expression float exp2 (expression float neg (expression 
float * (var_ref fog_temp@0x23cf490) (var_ref fog_temp@0x23cf490) ) ) ) 
(constant float (1.000000)) ) (constant float (0.000000)) ) ) 
     MOV TEMP[17], TEMP[16].xxxx;
 18: (expression float + (constant float (1.000000)) (expression float neg 
(var_ref fog_factor@0x23cf400) ) ) 
     ADD TEMP[19].x, CONST[4].xxxx, TEMP[16].-x-x-x-x;
 19: (expression vec3 * (swiz xyz (record_ref (var_ref gl_Fog@0x23ce9f0)  
color) )(expression float + (constant float (1.000000)) (expression float neg 
(var_ref fog_factor@0x23cf400) ) ) ) 
     MUL TEMP[20].xyz, STATE[2].xyzz, TEMP[19].xxxx;
 20: (expression vec3 + (expression vec3 * (swiz xyz (record_ref (var_ref 
gl_Fog@0x23ce9f0)  color) )(expression float + (constant float (1.000000)) 
(expression float neg (var_ref fog_factor@0x23cf400) ) ) ) (expression vec3 * 
(swiz xyz (var_ref texenv_combine@0x23ceeb0) )(var_ref fog_factor@0x23cf400) ) 
) 
     MAD TEMP[21], TEMP[8].xyzz, TEMP[16].xxxx, TEMP[20].xyzz;
 21: (assign  (xyz) (var_ref fog_result@0x23cf2f0)  (expression vec3 + 
(expression vec3 * (swiz xyz (record_ref (var_ref gl_Fog@0x23ce9f0)  color) 
)(expression float + (constant float (1.000000)) (expression float neg (var_ref 
fog_factor@0x23cf400) ) ) ) (expression vec3 * (swiz xyz (var_ref 
texenv_combine@0x23ceeb0) )(var_ref fog_factor@0x23cf400) ) ) ) 
     MOV TEMP[10].xyz, TEMP[21].xyzx;
 22: (assign  (xyzw) (var_ref gl_FragColor@0x23cbde0)  (var_ref 
fog_result@0x23cf2f0) ) 
     MOV OUTPUT[2], TEMP[10];
 23: END

Mesa IR pre Mesa IR optimizations
# Fragment Program/Shader 0
  0: MOV TEMP[1], STATE[2];
  1: MOV TEMP[2], STATE[3].xxxx;
  2: MOV TEMP[3], STATE[3].yyyy;
  3: MOV TEMP[4], STATE[3].zzzz;
  4: MOV TEMP[5], STATE[3].wwww;
  5: MOV TEMP[6], INPUT[4].xyyy;
  6: MOV TEMP[6].w, INPUT[4].wwww;
  7: TXP TEMP[7], INPUT[4].xyyw, texture[0], 2D;
  8: MUL TEMP[8], TEMP[7], INPUT[1];
  9: MOV TEMP[9].xyz, TEMP[8].xyzx;
 10: MOV TEMP[9].w, INPUT[1].wwww;
 11: MOV TEMP[10], TEMP[9];
 12: MUL TEMP[11].x, INPUT[3].xxxx, STATE[1].wwww;
 13: MOV TEMP[12], TEMP[11].xxxx;
 14: MUL TEMP[13].x, TEMP[11].xxxx, TEMP[11].xxxx;
 15: EX2 TEMP[15].x, TEMP[13].-x-x-x-x;
 16: MOV_SAT TEMP[16], TEMP[15].xxxx;
 17: MOV TEMP[17], TEMP[16].xxxx;
 18: ADD TEMP[19].x, CONST[4].xxxx, TEMP[16].-x-x-x-x;
 19: MUL TEMP[20].xyz, STATE[2].xyzz, TEMP[19].xxxx;
 20: MAD TEMP[21], TEMP[8].xyzz, TEMP[16].xxxx, TEMP[20].xyzz;
 21: MOV TEMP[10].xyz, TEMP[21].xyzx;
 22: MOV OUTPUT[2], TEMP[10];
 23: END

Mesa IR post Mesa IR optimizations
# Fragment Program/Shader 0
  0: TXP TEMP[0], INPUT[4].xyyw, texture[0], 2D;
  1: MUL TEMP[1].xyz, TEMP[0], INPUT[1];
  2: MOV TEMP[0].xyz, TEMP[1].xyzx;
  3: MOV TEMP[0].w, INPUT[1].wwww;
  4: MOV TEMP[2], TEMP[0];
  5: MUL TEMP[0].x, INPUT[3].xxxx, STATE[1].wwww;
  6: MUL TEMP[3].x, TEMP[0].xxxx, TEMP[0].xxxx;
  7: EX2 TEMP[0].x, TEMP[3].-x-x-x-x;
  8: MOV_SAT TEMP[3].x, TEMP[0].xxxx;
  9: ADD TEMP[0].x, CONST[4].xxxx, TEMP[3].-x-x-x-x;
 10: MUL TEMP[4].xyz, STATE[2].xyzz, TEMP[0].xxxx;
 11: MAD TEMP[2].xyz, TEMP[1].xyzx, TEMP[3].xxxx, TEMP[4].xyzx;
 12: MOV OUTPUT[2], TEMP[2];
 13: END

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] GL fixed function fragment shaders

Reply via email to