On Tuesday, January 17, 2017 11:40:33 AM PST Francisco Jerez wrote: > Kenneth Graunke <kenn...@whitecape.org> writes: > > > (Co-authored by Matt Turner.) > > > > Image atomics, for example, return a value - but the shader may not > > want to use it. We assigned a useless VGRF destination. This seemed > > harmless, but it can actually be quite harmful. The register allocator > > has to assign that VGRF to a real register. It may assign the same > > actual GRF to the destination of an instruction that follows soon after. > > > > This results in a write-after-write (WAW) dependency, and stall. > > > > A number of "Deus Ex: Mankind Divided" shaders use image atomics, but > > don't use the return value. Several of these were hitting WAW stalls > > for nearly 14,000 (poorly estimated) cycles a pop. Making dead code > > elimination null out the destination avoids this issue. > > > > This patch cuts one shader's estimated cycles by -98.39%! Removing the > > message response should also help with data cluster bandwidth. > > > > On Skylake: > > > > total instructions in shared programs: 13366907 -> 13363051 (-0.03%) > > instructions in affected programs: 49635 -> 45779 (-7.77%) > > helped: 133 > > HURT: 0 > > > > total cycles in shared programs: 255433388 -> 248081818 (-2.88%) > > cycles in affected programs: 12370702 -> 5019132 (-59.43%) > > helped: 100 > > HURT: 24 > > > > Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> > > --- > > .../dri/i965/brw_fs_dead_code_eliminate.cpp | 56 > > ++++++++++++++++------ > > 1 file changed, 41 insertions(+), 15 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp > > b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp > > index 930dc733b45..885ae2638a8 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp > > @@ -34,6 +34,43 @@ > > * yet in the tail end of this block. > > */ > > > > +static bool > > +can_eliminate(const fs_inst *inst, BITSET_WORD *flag_live) > > +{ > > + return inst->opcode != BRW_OPCODE_IF && > > + inst->opcode != BRW_OPCODE_WHILE && > > Isn't this missing a bunch of other control flow instructions that > shouldn't be eliminated? What's going on here, are they also being > DCE'ed? Shouldn't this be !inst->is_control_flow()?
That would make more sense. Maybe Matt knows if there's a reason. I'm just moving code around here. I can write a patch to change that first, though... > > + !inst->has_side_effects() && > > + !(flag_live[0] & inst->flags_written()) && > > + !inst->writes_accumulator; > > +} > > + > > +static bool > > +can_omit_write(const fs_inst *inst, BITSET_WORD *flag_live) > > +{ > > + switch (inst->opcode) { > > + case SHADER_OPCODE_UNTYPED_ATOMIC: > > + case SHADER_OPCODE_UNTYPED_ATOMIC_LOGICAL: > > + case SHADER_OPCODE_TYPED_ATOMIC: > > + case SHADER_OPCODE_TYPED_ATOMIC_LOGICAL: > > + return true; > > + default: > > + /* If we're going to eliminate the instruction entirely, omitting the > > + * write is always safe. > > + */ > > + if (can_eliminate(inst, flag_live)) > > + return true; > > I think it would be cleaner to make this a predicate of the instruction > alone meaning 'can the instruction accept a null register as > destination?' (which is independent of the context it occurs in, and > can_eliminate() doesn't have any influence on, it's just that we're okay > with the pass below putting invalid instructions in the program as long > as we have the guarantee that they're going to be removed later on), and > do 'can_omit_write(inst) || can_eliminate(inst, flag_live)' below. Yeah, that does seem nicer. I'll make that change. > > + > > + /* We can eliminate the destination write for ordinary instructions, > > + * but not most SENDs. > > + */ > > + if (inst->opcode < 128 && inst->mlen == 0) > > + return true; > > + > > This is going to miss ALU-like virtual instructions, but I guess they > could be enumerated in the 'return true' case above together with > surface atomics. [Not saying that you necessarily need to do that > now ;)] Right. I figured this was a good start, and we can enumerate our own crazy opcodes above as we see fit. If you've got a better idea...
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev