On Wed, Apr 9, 2014 at 3:06 PM, Eric Anholt <e...@anholt.net> wrote: > Matt Turner <matts...@gmail.com> writes: > >> --- >> src/mesa/drivers/dri/i965/brw_shader.cpp | 16 ++++++++++++++++ >> src/mesa/drivers/dri/i965/brw_shader.h | 1 + >> 2 files changed, 17 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp >> b/src/mesa/drivers/dri/i965/brw_shader.cpp >> index f194437..c8796b3 100644 >> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp >> @@ -664,6 +664,22 @@ backend_instruction::can_do_saturate() const >> } >> >> bool >> +backend_instruction::reads_accumulator_implicitly() const >> +{ >> + switch (opcode) { >> + case BRW_OPCODE_MAC: >> + case BRW_OPCODE_MACH: >> + /* FINISHME: Enable these if we ever start emitting them. >> + * case BRW_OPCODE_SADA: >> + * case BRW_OPCODE_SADA2: >> + */ > > Let's just uncomment SADA2 right away to prevent pain in the future. > SAD2 doesn't read the acc, though. > > Other than that, the first 2 patches are: > > Reviewed-by: Eric Anholt <e...@anholt.net> > > I think scheduling is still broken in the last one, because you're > removing the barrier deps on implicit-accumulator opcodes and replacing > them with explicit dependencies, but you're not tracking the accumulator > updates by almost-all-instructions pre-gen6. The scheduler would be > free to slip in some unrelated instruction after the MUL in the > following snippet from brw_vec4_visitor.cpp:
Ah, that is true. I went looking for text about this, since I didn't know about it until you mentioned it recently. I see in the GM45 docs a 'Accumulator Disable' bit in cr0. I wonder whether all of the false write-after-write dependencies on the accumulator actually cause stalls, and if so whether we should attempt to disable accumulator writes. We don't seem to have any cases where we rely on implicit accumulator updates that we couldn't replace with explicit accumulator destinations. > > emit(MUL(acc, op[0], op[1])); > emit(MACH(dst_null_d(), op[0], op[1])); > emit(MOV(result_dst, src_reg(acc))); > > (err, why are we doing MACH and MOV instead of just MACH into > result_dst?) mach writes the *high* 32-bits of the result into its destination (so useful for *mulExtended()). _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev