On Thu, Sep 5, 2013 at 5:12 PM, Chris Forbes <chr...@ijw.co.nz> wrote: > A possible explanation for the perf change is that Xonotic uses > anisotropic filtering at this quality level. Lowering to txl defeats > it. I had a look at that. gl_sampler->MaxAnisotropy is never greater than 1.0 in gen7_update_sampler_state() so there is no anisotropic filtering in this case.
It makes sense to me that avoiding punting to SIMD8 helps the performance. But it is not clear to me why >10% performance change can still be observed when INTEL_DEBUG=no16 is specified. A reasonable explanation is that the image quality is degraded in some way, which is why I am still nervous about the change. An alternative approach to avoid punting seems to emulate SIMD16 sample_d with two SIMD8 sample_d. It will take longer to implement given my familiarity with the code, and may be less performant. BUt that would allow things like anisotropic filtering to be honored. > It would be worth doing an image quality comparison before and after the > change. Yeah, that is worth doing. I will do that. > > -- Chris > > On Thu, Sep 5, 2013 at 8:35 PM, Chia-I Wu <olva...@gmail.com> wrote: >> sample_d is slower than the lowered version on gen7. For gen7, this improves >> Xonotic benchmark with Ultimate effects by as much as 25%: >> >> before the change: 40.06 fps >> after the change: 51.10 fps >> after the change with INTEL_DEBUG=no16: 44.46 fps >> >> As sample_d is not allowed in SIMD16 mode, I firstly thought the difference >> was from SIMD8 versus SIMD16. If that was the case, we would want to apply >> brw_lower_texture_gradients() only on fragment shaders in SIMD16 mode. >> >> But, as the numbers show, there is still 10% improvement when SIMD16 is >> forced >> off after the change. Thus textureGrad() is lowered unconditionally for now. >> Due to this and that I haven't tried it on Haswell, this is still RFC. >> >> No piglit regressions. >> >> Signed-off-by: Chia-I Wu <olva...@gmail.com> >> --- >> .../dri/i965/brw_lower_texture_gradients.cpp | 54 >> ++++++++++++++-------- >> 1 file changed, 36 insertions(+), 18 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp >> b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp >> index 1589a20..f3fcb56 100644 >> --- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp >> @@ -34,8 +34,8 @@ using namespace ir_builder; >> >> class lower_texture_grad_visitor : public ir_hierarchical_visitor { >> public: >> - lower_texture_grad_visitor(bool has_sample_d_c) >> - : has_sample_d_c(has_sample_d_c) >> + lower_texture_grad_visitor(bool has_sample_d, bool has_sample_d_c) >> + : has_sample_d(has_sample_d), has_sample_d_c(has_sample_d_c) >> { >> progress = false; >> } >> @@ -44,6 +44,7 @@ public: >> >> >> bool progress; >> + bool has_sample_d; >> bool has_sample_d_c; >> >> private: >> @@ -90,22 +91,33 @@ txs_type(const glsl_type *type) >> ir_visitor_status >> lower_texture_grad_visitor::visit_leave(ir_texture *ir) >> { >> - /* Only lower textureGrad with shadow samplers */ >> - if (ir->op != ir_txd || !ir->shadow_comparitor) >> + if (ir->op != ir_txd) >> return visit_continue; >> >> - /* Lower textureGrad() with samplerCubeShadow even if we have the >> sample_d_c >> - * message. GLSL provides gradients for the 'r' coordinate. >> Unfortunately: >> - * >> - * From the Ivybridge PRM, Volume 4, Part 1, sample_d message >> description: >> - * "The r coordinate contains the faceid, and the r gradients are ignored >> - * by hardware." >> - * >> - * We likely need to do a similar treatment for samplerCube and >> - * samplerCubeArray, but we have insufficient testing for that at the >> moment. >> - */ >> - bool need_lowering = !has_sample_d_c || >> - ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE; >> + bool need_lowering = false; >> + >> + if (ir->shadow_comparitor) { >> + /* Lower textureGrad() with samplerCubeShadow even if we have the >> + * sample_d_c message. GLSL provides gradients for the 'r' >> coordinate. >> + * Unfortunately: >> + * >> + * From the Ivybridge PRM, Volume 4, Part 1, sample_d message >> + * description: "The r coordinate contains the faceid, and the r >> + * gradients are ignored by hardware." >> + */ >> + if (ir->sampler->type->sampler_dimensionality == >> GLSL_SAMPLER_DIM_CUBE) >> + need_lowering = true; >> + else if (!has_sample_d_c) >> + need_lowering = true; >> + } >> + else { >> + /* We likely need to do a similar treatment for samplerCube and >> + * samplerCubeArray, but we have insufficient testing for that at the >> + * moment. >> + */ >> + if (!has_sample_d) >> + need_lowering = true; >> + } >> >> if (!need_lowering) >> return visit_continue; >> @@ -154,7 +166,9 @@ lower_texture_grad_visitor::visit_leave(ir_texture *ir) >> expr(ir_unop_sqrt, dot(dPdy, dPdy))); >> } >> >> - /* lambda_base = log2(rho). We're ignoring GL state biases for now. */ >> + /* lambda_base = log2(rho). It will be biased and clamped by values >> + * defined in SAMPLER_STATE to get the final lambda. >> + */ >> ir->op = ir_txl; >> ir->lod_info.lod = expr(ir_unop_log2, rho); >> >> @@ -168,8 +182,12 @@ bool >> brw_lower_texture_gradients(struct brw_context *brw, >> struct exec_list *instructions) >> { >> + /* sample_d is slower than the lowered version on gen7, and is not >> allowed >> + * in SIMD16 mode. Treating it as unsupported improves the performance. >> + */ >> + bool has_sample_d = brw->gen != 7; >> bool has_sample_d_c = brw->gen >= 8 || brw->is_haswell; >> - lower_texture_grad_visitor v(has_sample_d_c); >> + lower_texture_grad_visitor v(has_sample_d, has_sample_d_c); >> >> visit_list_elements(&v, instructions); >> >> -- >> 1.8.3.1 >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev -- o...@lunarg.com _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev