I think I would need some hints here, I don't know how to test my changes now properly.
On Sat, Mar 22, 2014 at 8:26 AM, Matt Turner <matts...@gmail.com> wrote: > On Fri, Mar 21, 2014 at 11:17 PM, Matt Turner <matts...@gmail.com> wrote: >> On Fri, Mar 21, 2014 at 7:33 AM, Juha-Pekka Heikkila >> <juhapekka.heikk...@gmail.com> wrote: >>> Our hardware has an "accumulator" register, which can be used to store >>> intermediate results across multiple instructions. Many instructions >>> can implicitly write a value to the accumulator in addition to their >>> normal destination register. This is enabled by the "AccWrEn" flag. >>> >>> This patch introduces a new flag, inst->writes_accumulator, which >>> allows us to express the AccWrEn notion in the IR. It also creates a >>> n ALU2_ACC macro to easily define emitters for instructions that >>> implicitly write the accumulator. >>> >>> Previously, we only supported implicit accumulator writes from the >>> ADDC, SUBB, and MACH instructions. We always enabled them on those >>> instructions, and left them disabled for other instructions. >>> >>> To take advantage of the MAC (multiply-accumulate) instruction, we >>> need to be able to set AccWrEn on other types of instructions. >>> >>> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikk...@gmail.com> >>> --- >>> src/mesa/drivers/dri/i965/brw_fs.cpp | 34 >>> +++++++++++------------- >>> src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 7 +---- >>> src/mesa/drivers/dri/i965/brw_shader.cpp | 3 +++ >>> src/mesa/drivers/dri/i965/brw_shader.h | 1 + >>> src/mesa/drivers/dri/i965/brw_vec4.cpp | 15 +---------- >>> src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 7 +---- >>> src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 21 +++++++++++---- >>> 7 files changed, 39 insertions(+), 49 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >>> b/src/mesa/drivers/dri/i965/brw_fs.cpp >>> index 27cf0f6..107d9e6 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >>> @@ -64,6 +64,8 @@ fs_inst::init() >>> >>> /* This will be the case for almost all instructions. */ >>> this->regs_written = 1; >>> + >>> + this->writes_accumulator = false; >>> } >>> >>> fs_inst::fs_inst() >>> @@ -151,6 +153,15 @@ fs_inst::fs_inst(enum opcode opcode, fs_reg dst, >>> return new(mem_ctx) fs_inst(BRW_OPCODE_##op, dst, src0, src1); \ >>> } >>> >>> +#define ALU2_ACC(op) \ >>> + fs_inst * \ >>> + fs_visitor::op(fs_reg dst, fs_reg src0, fs_reg src1) \ >>> + { \ >>> + fs_inst *inst = new(mem_ctx) fs_inst(BRW_OPCODE_##op, dst, src0, >>> src1);\ >>> + inst->writes_accumulator = true; \ >>> + return inst; \ >>> + } >>> + >>> #define ALU3(op) \ >>> fs_inst * \ >>> fs_visitor::op(fs_reg dst, fs_reg src0, fs_reg src1, fs_reg src2) \ >>> @@ -166,7 +177,7 @@ ALU1(RNDE) >>> ALU1(RNDZ) >>> ALU2(ADD) >>> ALU2(MUL) >>> -ALU2(MACH) >>> +ALU2_ACC(MACH) >>> ALU2(AND) >>> ALU2(OR) >>> ALU2(XOR) >>> @@ -182,8 +193,8 @@ ALU1(FBH) >>> ALU1(FBL) >>> ALU1(CBIT) >>> ALU3(MAD) >>> -ALU2(ADDC) >>> -ALU2(SUBB) >>> +ALU2_ACC(ADDC) >>> +ALU2_ACC(SUBB) >>> ALU2(SEL) >>> >>> /** Gen4 predicated IF. */ >>> @@ -2103,21 +2114,8 @@ fs_visitor::dead_code_eliminate() >>> } >>> >>> if (dead) { >>> - /* Don't dead code eliminate instructions that write to the >>> - * accumulator as a side-effect. Instead just set the >>> destination >>> - * to the null register to free it. >>> - */ >>> - switch (inst->opcode) { >>> - case BRW_OPCODE_ADDC: >>> - case BRW_OPCODE_SUBB: >>> - case BRW_OPCODE_MACH: >>> - inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type)); >>> - break; >>> - default: >>> - inst->remove(); >>> - progress = true; >>> - break; >>> - } >>> + inst->remove(); >>> + progress = true; >> >> I think you meant something like >> >> if (inst->writes_accumulator) { >> inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type)); >> } else { >> inst->remove(); >> } >> progress = true; >> >> rather than removing instructions that implicitly wrote the >> accumulator when their destination was dead. I did take the entire switch/case block out because it seemed of very little use and possibly more harmful than useful. I was trying to see when the switch case I took out was used for ADDC/SUBB/MACH but it seemed to execute close to never on running piglit glsl tests. Reason that part being skipped is when ADDC/SUBB opcodes are used the destination register is set to dst_null_ud() before arriving here, similar is almost always true for MACH. I did go checking and found null register is not grf but considered arf which is internally enumerated to HW_REG, this causes above switch case to be skipped with 'if's. For example in vec4_visitor::dead_code_eliminate() there is just outside the switch case: ... if (inst->dst.file == GRF && !inst->has_side_effects()) { ... For MACH there however can be found instance in vec4_visitor::visit(ir_expression *ir) (and similar in fs_visitor) where it says ... case ir_binop_imul_high: { struct brw_reg acc = retype(brw_acc_reg(), result_dst.type); emit(MUL(acc, op[0], op[1])); emit(MACH(result_dst, op[0], op[1])); break; } ... Above piece of code to my understanding should go wrong in current version dead code elimination where MACH is forced to have destination in null register? After using writes_accumulator flag one can get similar difficulties using many other opcodes. This was prevented when I had has_side_effects() returning true when writes_accumulator flag was set -- above "if" dst.file == GRF is true but then inst->has_side_effects() prevent from changing the destination. > > Oh, I see. You added writes_accumulator to has_side_effects(). I don't > think we want to do that. It weakens dead code elimination and > instruction scheduling. Instruction scheduling in particular, because > we treat instructions with side effects as full barriers, preventing > even independent instructions from being scheduled across the barrier. > > I don't see anything in the scheduler that handles the accumulator, > but it shouldn't be very difficult to do it properly. Handling it like > we do the flag register (look for last_conditional_mod) would probably > be good. I removed my changes from has_side_effects() and did try several variations to use writes_accumulator flag in calculate_deps() but the trouble is I never seem to get any regression. Even when I completely removed all the checks for the accumulator flag from calculate_deps() I don't get any regression so I figure I am not getting hit by scheduling. Here is where I would need some idea how to check my change does not break something, currently I don't know what to look for. /Juha-Pekka _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev