On 20 June 2012 23:35, Kenneth Graunke <kenn...@whitecape.org> wrote:
> On 06/20/2012 04:08 PM, Paul Berry wrote: > > On i965, dFdx() and dFdy() are computed by taking advantage of the > > fact that each consecutive set of 4 pixels dispatched to the fragment > > shader always constitutes a contiguous 2x2 block of pixels in a fixed > > arrangement known as a "sub-span". So we calculate dFdx() by taking > > the difference between the values computed for the left and right > > halves of the sub-span, and we calculate dFdy() by taking the > > difference between the values computed for the top and bottom halves > > of the sub-span. > > > > However, there's a subtlety when FBOs are in use: since FBOs use a > > coordinate system where the origin is at the upper left, and window > > system framebuffers use a coordinate system where the origin is at the > > lower left, the computation of dFdy() needs to be negated for FBOs. > > Nice catch. > Thanks. I was fortunate to find it--it cropped up while writing piglit tests for MSAA centroid behaviour, since those tests run in FBOs and involve checking that derivatives work properly. > > > This patch modifies the fragment shader back-ends to negate the value > > of dFdy() when an FBO is in use. It also modifies the code that > > populates the program key (brw_wm_populate_key() and > > brw_fs_precompile()) so that they always record in the program key > > whether we are rendering to an FBO or to a window system framebuffer; > > this ensures that the fragment shader will get recompiled when > > switching between FBO and non-FBO use. > > > > This will result in unnecessary recompiles of fragment shaders that > > don't use dFdy(). To fix that, we will need to adapt the GLSL and > > NV_fragment_program front-ends to record whether or not a given shader > > uses dFdy(). I plan to implement this in a future patch series; I've > > left FIXME comments in the code as a reminder. > > Yeah, it looks like that kind of infrastructure is becoming increasingly > important. > > > Fixes Piglit test "fbo-deriv". > > --- > > src/mesa/drivers/dri/i965/brw_fs.cpp | 10 ++++++++++ > > src/mesa/drivers/dri/i965/brw_fs.h | 3 ++- > > src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 14 +++++++++++--- > > src/mesa/drivers/dri/i965/brw_wm.c | 10 ++++++++++ > > src/mesa/drivers/dri/i965/brw_wm.h | 3 ++- > > src/mesa/drivers/dri/i965/brw_wm_emit.c | 15 +++++++++++---- > > 6 files changed, 46 insertions(+), 9 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > > index 313e720..43efd68 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > > @@ -1848,6 +1848,13 @@ brw_fs_precompile(struct gl_context *ctx, struct > gl_shader_program *prog) > > struct brw_context *brw = brw_context(ctx); > > struct brw_wm_prog_key key; > > > > + /* As a temporary measure we assume that all programs use dFdy() > (and hence > > + * need to be compiled differently depending on whether we're > rendering to > > + * an FBO). FIXME: set this bool correctly based on the contents of > the > > + * program. > > + */ > > + bool program_uses_dfdy = true; > > + > > I would just leave this as false. Almost all fragment shaders that I've > seen don't use dPdy, so arguably false is the common case (which is what > precompile is arguably aiming for). Though, it doesn't matter > particularly much---the state settings selected here are fairly > arbitrary, and at the end of the day, the actual compilation process > will do the right thing regardless. > > Also, note that the precompile isn't used by default (unless you set the > driconf option shader_precompile = true), so you most likely won't hit > this at all. > > If you'd rather leave it, that's fine. It's not a big deal either way. > Hmm, thanks for the explanation. I didn't previously understand the purpose of precompiling. I have to admit I'm not thrilled about the code duplication between brw_fs_precompile() and brw_wm_populate_key(). In my experience, code duplication between an often-used and a rarely-used function tends to lead to bugs, because when people modify the often-used function they forget to modify the rarely-used function. (In fact it looks like your recent patch "i965: Don't set brw_wm_prog_key::iz_lookup on Gen6+." just modified brw_wm_populate_key()). I don't really have a good suggestion about how to eliminate the duplication, though. > > > if (!prog->_LinkedShaders[MESA_SHADER_FRAGMENT]) > > return true; > > > > @@ -1892,6 +1899,9 @@ brw_fs_precompile(struct gl_context *ctx, struct > gl_shader_program *prog) > > > > if (fp->Base.InputsRead & FRAG_BIT_WPOS) { > > key.drawable_height = ctx->DrawBuffer->Height; > > + } > > + > > + if ((fp->Base.InputsRead & FRAG_BIT_WPOS) || program_uses_dfdy) { > > key.render_to_fbo = _mesa_is_user_fbo(ctx->DrawBuffer); > > } > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h > b/src/mesa/drivers/dri/i965/brw_fs.h > > index 0c6c3e4..2c2cb6c 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs.h > > +++ b/src/mesa/drivers/dri/i965/brw_fs.h > > @@ -534,7 +534,8 @@ public: > > struct brw_reg src); > > void generate_discard(fs_inst *inst); > > void generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg > src); > > - void generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg > src); > > + void generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg > src, > > + bool negate_value); > > void generate_spill(fs_inst *inst, struct brw_reg src); > > void generate_unspill(fs_inst *inst, struct brw_reg dst); > > void generate_pull_constant_load(fs_inst *inst, struct brw_reg dst); > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp > b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp > > index 522123c..0881ad7 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp > > @@ -441,8 +441,13 @@ fs_visitor::generate_ddx(fs_inst *inst, struct > brw_reg dst, struct brw_reg src) > > brw_ADD(p, dst, src0, negate(src1)); > > } > > > > +/* The negate_value boolean is used to negate the derivative > computation for > > + * FBOs, since they place the origin at the upper left instead of the > lower > > + * left. > > + */ > > void > > -fs_visitor::generate_ddy(fs_inst *inst, struct brw_reg dst, struct > brw_reg src) > > +fs_visitor::generate_ddy(fs_inst *inst, struct brw_reg dst, struct > brw_reg src, > > + bool negate_value) > > { > > struct brw_reg src0 = brw_reg(src.file, src.nr, 0, > > BRW_REGISTER_TYPE_F, > > @@ -456,7 +461,10 @@ fs_visitor::generate_ddy(fs_inst *inst, struct > brw_reg dst, struct brw_reg src) > > BRW_WIDTH_4, > > BRW_HORIZONTAL_STRIDE_0, > > BRW_SWIZZLE_XYZW, WRITEMASK_XYZW); > > - brw_ADD(p, dst, src0, negate(src1)); > > + if (negate_value) > > + brw_ADD(p, dst, src1, negate(src0)); > > + else > > + brw_ADD(p, dst, src0, negate(src1)); > > } > > > > void > > @@ -902,7 +910,7 @@ fs_visitor::generate_code() > > generate_ddx(inst, dst, src[0]); > > break; > > case FS_OPCODE_DDY: > > - generate_ddy(inst, dst, src[0]); > > + generate_ddy(inst, dst, src[0], c->key.render_to_fbo); > > break; > > > > case FS_OPCODE_SPILL: > > diff --git a/src/mesa/drivers/dri/i965/brw_wm.c > b/src/mesa/drivers/dri/i965/brw_wm.c > > index e919e5a..300e3bb 100644 > > --- a/src/mesa/drivers/dri/i965/brw_wm.c > > +++ b/src/mesa/drivers/dri/i965/brw_wm.c > > @@ -420,6 +420,13 @@ static void brw_wm_populate_key( struct brw_context > *brw, > > GLuint line_aa; > > GLuint i; > > > > + /* As a temporary measure we assume that all programs use dFdy() > (and hence > > + * need to be compiled differently depending on whether we're > rendering to > > + * an FBO). FIXME: set this bool correctly based on the contents of > the > > + * program. > > + */ > > + bool program_uses_dfdy = true; > > + > > memset(key, 0, sizeof(*key)); > > > > /* Build the index for table lookup > > @@ -519,6 +526,9 @@ static void brw_wm_populate_key( struct brw_context > *brw, > > */ > > if (fp->program.Base.InputsRead & FRAG_BIT_WPOS) { > > key->drawable_height = ctx->DrawBuffer->Height; > > + } > > + > > + if ((fp->program.Base.InputsRead & FRAG_BIT_WPOS) || > program_uses_dfdy) { > > key->render_to_fbo = _mesa_is_user_fbo(ctx->DrawBuffer); > > } > > > > diff --git a/src/mesa/drivers/dri/i965/brw_wm.h > b/src/mesa/drivers/dri/i965/brw_wm.h > > index 8f1cb8c..2cde2a0 100644 > > --- a/src/mesa/drivers/dri/i965/brw_wm.h > > +++ b/src/mesa/drivers/dri/i965/brw_wm.h > > @@ -346,7 +346,8 @@ void emit_ddxy(struct brw_compile *p, > > const struct brw_reg *dst, > > GLuint mask, > > bool is_ddx, > > - const struct brw_reg *arg0); > > + const struct brw_reg *arg0, > > + bool negate_value); > > void emit_delta_xy(struct brw_compile *p, > > const struct brw_reg *dst, > > GLuint mask, > > diff --git a/src/mesa/drivers/dri/i965/brw_wm_emit.c > b/src/mesa/drivers/dri/i965/brw_wm_emit.c > > index e27ff35..1258efe 100644 > > --- a/src/mesa/drivers/dri/i965/brw_wm_emit.c > > +++ b/src/mesa/drivers/dri/i965/brw_wm_emit.c > > @@ -457,12 +457,16 @@ void emit_frontfacing(struct brw_compile *p, > > * between each other. We could probably do it like ddx and swizzle > the right > > * order later, but bail for now and just produce > > * ((ss0.tl - ss0.bl)x4 (ss1.tl - ss1.bl)x4) > > + * > > + * The negate_value boolean is used to negate the d/dy computation for > FBOs, > > + * since they place the origin at the upper left instead of the lower > left. > > */ > > void emit_ddxy(struct brw_compile *p, > > const struct brw_reg *dst, > > GLuint mask, > > bool is_ddx, > > - const struct brw_reg *arg0) > > + const struct brw_reg *arg0, > > + bool negate_value) > > { > > int i; > > struct brw_reg src0, src1; > > @@ -498,7 +502,10 @@ void emit_ddxy(struct brw_compile *p, > > BRW_HORIZONTAL_STRIDE_0, > > BRW_SWIZZLE_XYZW, WRITEMASK_XYZW); > > } > > - brw_ADD(p, dst[i], src0, negate(src1)); > > + if (negate_value) > > + brw_ADD(p, dst[i], src1, negate(src0)); > > + else > > + brw_ADD(p, dst[i], src0, negate(src1)); > > } > > } > > if (mask & SATURATE) > > @@ -1746,11 +1753,11 @@ void brw_wm_emit( struct brw_wm_compile *c ) > > break; > > > > case OPCODE_DDX: > > - emit_ddxy(p, dst, dst_flags, true, args[0]); > > + emit_ddxy(p, dst, dst_flags, true, args[0], false); > > break; > > > > case OPCODE_DDY: > > - emit_ddxy(p, dst, dst_flags, false, args[0]); > > + emit_ddxy(p, dst, dst_flags, false, args[0], > c->key.render_to_fbo); > > break; > > > > case OPCODE_DP2: > > > > This seems like a sensible solution to the problem at hand. > > Reviewed-by: Kenneth Graunke <kenn...@whitecape.org> > > It would be nice to not have to set this in the key universally, though, > as this could have a pretty big impact on actual applications (twice the > compiles, blowing out the program cache, etc.). Hate to pull out the > big hammer for a minor bug. Of course, without measuring the impact on > applications, I'm just wildly guessing :) Perhaps I can check a few > apps tomorrow. > I will be curious to hear what you find--I suspect that it's rare for apps to re-use the same fragment shader on both FBOs and the main window (and when they do, I suspect they only do this for small shaders that are inexpensive to recompile). But of course I'm guessing too :). If you do find a slowdown, that will be further motivation to get cracking on the follow-up patch, so that we only do the recompile if the shader actually uses dFdy(). > > I wonder, actually, whether compiling two (four) entirely separate > copies of the shader is really the best solution to this problem. It > might be best to simply add an implicit "uniform bool render_to_fbo" to > any shaders that use dFdy() and emit two conditional ADDs, one for the > FBO case and one for non-FBO. This would add pretty minimal overhead to > the shader, but be a lot nicer in terms of compile time and program > cache usage. > Of course, that's probably harder, so I'm not suggesting it in the short > term...more an idea to think about down the road. > Yeah, I've been thinking about this too. It seems like a good case to pilot test this idea with would be key.drawable_height, since that one could in pathological cases lead to an awful lot of recompiles.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev