On 09/11/2013 10:00 PM, Chia-I Wu wrote:
> From: Chia-I Wu <o...@lunarg.com>
> Replicate the gradient of the top-left pixel to the other three pixels in the
> subspan, as how DDY is implemented.  Before, different graidents were used for
> pixels in the top row and pixels in the bottom row.
> This change results in a less accurate approximation.  However, it improves
> the performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at
> 95.0% confidence) on Haswell.  No noticeable image quality difference
> observed.
> No piglit gpu.tests regressions.
> I failed to come up with an explanation for the performance difference.  The
> change does not make a difference on Ivy Bridge either.  If anyone has the
> insight, please kindly enlighten me.  Performance differences may also be
> observed on other games that call textureGrad and dFdx.

After all the experiments and discussions with the hardware guys, lets
go ahead and do this.  We should do a couple things, however.

1. Disable the optimization if the application explicitly sets

2. Add a driconf option, as suggested by Chris, to disable the optimization.

3. Use the same DDX / DDY calculation on all platforms.

4. Update the commit message and the comment in the code with the
explanation of the optimization (the HSW sample_d instruction does some
optimizations if the same LOD is used for all pixels, etc.).

> Signed-off-by: Chia-I Wu <o...@lunarg.com>
> ---
>  src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
> index bfb3d33..c0d24a0 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
> @@ -564,16 +564,25 @@ fs_generator::generate_tex(fs_inst *inst, struct 
> brw_reg dst, struct brw_reg src
>  void
>  fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg 
> src)
>  {
> +   /* approximate with ((ss0.tr - ss0.tl)x4 (ss1.tr - ss1.tl)x4) on Haswell,
> +    * which gives much better performance when the result is used with
> +    * sample_d
> +    */
> +   unsigned vstride = (brw->is_haswell) ? BRW_VERTICAL_STRIDE_4 :
> +                                          BRW_VERTICAL_STRIDE_2;
> +   unsigned width = (brw->is_haswell) ? BRW_WIDTH_4 :
> +                                        BRW_WIDTH_2;
> +
>     struct brw_reg src0 = brw_reg(src.file, src.nr, 1,
>                                BRW_REGISTER_TYPE_F,
> -                              BRW_VERTICAL_STRIDE_2,
> -                              BRW_WIDTH_2,
> +                              vstride,
> +                              width,
>                                BRW_HORIZONTAL_STRIDE_0,
>                                BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
>     struct brw_reg src1 = brw_reg(src.file, src.nr, 0,
>                                BRW_REGISTER_TYPE_F,
> -                              BRW_VERTICAL_STRIDE_2,
> -                              BRW_WIDTH_2,
> +                              vstride,
> +                              width,
>                                BRW_HORIZONTAL_STRIDE_0,
>                                BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
>     brw_ADD(p, dst, src0, negate(src1));

mesa-dev mailing list

Reply via email to