This makes a lot of sense Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net>
On Tue, Aug 16, 2016 at 1:54 PM, Francisco Jerez <curroje...@riseup.net> wrote: > ANY4H is more efficient than ANY8H and ANY16H because it makes sure > that whenever a whole subspan hits a discard statement it gets > disabled by the EU until the end of the program, regardless of whether > the discard condition is uniform across all channels of the SIMD8-16 > thread. OTOH ANY8H/ANY16H would cause the rest of the program to be > executed for *all* channels if only one of the channels hadn't taken > the discard branch, potentially increasing the bandwidth and ALU usage > of the program unnecessarily. > > This change increases the FPS by over 3x of a simple micro-benchmark > that discards a bunch of fragments and then does a single costly > texturing operation. I've just re-verified the FPS change on HSW and > SKL, but I expect all platforms from Gen6 up to get a similar benefit. > > Note that we could potentially be more aggressive and use the NORMAL > predicate to discard individual channels, but that would need to > happen post-scheduling because the scheduler currently doesn't care to > reorder HALT instructions with respect to other instructions, and the > NORMAL predicate would cause the results of subsequent derivative > computations to become undefined -- If the scheduler didn't reorder > HALT instructions it would actually be safe to switch to NORMAL > because the behavior of derivative computations after a non-uniform > discard statement is undefined by the GLSL spec, but that would make > the optimization implemented by one of the following commits somewhat > more difficult. > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index d1ac80a..c5067cd 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -1394,9 +1394,7 @@ fs_visitor::emit_discard_jump() > fs_inst *discard_jump = bld.emit(FS_OPCODE_DISCARD_JUMP); > discard_jump->flag_subreg = 1; > > - discard_jump->predicate = (dispatch_width == 8) > - ? BRW_PREDICATE_ALIGN1_ANY8H > - : BRW_PREDICATE_ALIGN1_ANY16H; > + discard_jump->predicate = BRW_PREDICATE_ALIGN1_ANY4H; > discard_jump->predicate_inverse = true; > } > > -- > 2.9.0 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev