On Tue, Apr 16, 2013 at 2:46 AM, Vadim Girlin <vadimgir...@gmail.com> wrote: > On 04/15/2013 11:22 PM, Martin Andersson wrote: >> >> There is a hardware bug on Cayman where a BREAK/CONTINUE followed by >> LOOP_STARTxxx for nested loops may put the branch stack into a state >> such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this >> by replacing the ALU_PUSH_BEFORE with a PUSH + ALU >> >> Fixes piglit tests EXT_transform_feedback/order* >> >> v2: Use existing loop count and improve comment >> --- >> src/gallium/drivers/r600/r600_shader.c | 17 ++++++++++++++--- >> 1 file changed, 14 insertions(+), 3 deletions(-) >> >> diff --git a/src/gallium/drivers/r600/r600_shader.c >> b/src/gallium/drivers/r600/r600_shader.c >> index 6dbca50..f4398fd 100644 >> --- a/src/gallium/drivers/r600/r600_shader.c >> +++ b/src/gallium/drivers/r600/r600_shader.c >> @@ -5490,7 +5490,7 @@ static int tgsi_opdst(struct r600_shader_ctx *ctx) >> return 0; >> } >> >> -static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode) >> +static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode, int >> alu_type) >> { >> struct r600_bytecode_alu alu; >> int r; >> @@ -5510,7 +5510,7 @@ static int emit_logic_pred(struct r600_shader_ctx >> *ctx, int opcode) >> >> alu.last = 1; >> >> - r = r600_bytecode_add_alu_type(ctx->bc, &alu, >> CF_OP_ALU_PUSH_BEFORE); >> + r = r600_bytecode_add_alu_type(ctx->bc, &alu, alu_type); >> if (r) >> return r; >> return 0; >> @@ -5730,7 +5730,18 @@ static void break_loop_on_flag(struct >> r600_shader_ctx *ctx, unsigned fc_sp) >> >> static int tgsi_if(struct r600_shader_ctx *ctx) >> { >> - emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT); >> + int alu_type = CF_OP_ALU_PUSH_BEFORE; >> + >> + /* There is a hardware bug on Cayman where a BREAK/CONTINUE >> followed by >> + * LOOP_STARTxxx for nested loops may put the branch stack into a >> state >> + * such that ALU_PUSH_BEFORE doesn't work as expected. Workaround >> this >> + * by replacing the ALU_PUSH_BEFORE with a PUSH + ALU */ >> + if (ctx->bc->chip_class == CAYMAN && ctx->bc->stack.loop > 1) { >> + r600_bytecode_add_cfinst(ctx->bc, CF_OP_PUSH); > > > Oh, it seems I overlooked potential issue here: jump address for PUSH is not > set properly, so I guess there will be GPU lockups in case of a jump. > Ideally we could set it to jump over the whole IF-ENDIF block if there are > no active threads, but I think it's a rare case, so simplest fix is to avoid > computation of the address and set jump address for PUSH to the next > instruction, like this: > > ctx->bc->cf_last->cf_addr = ctx->bc->cf_last->id + 2; > > We can improve it later but anyway ALU_PUSH_BEFORE never jumped at all so I > think at least we won't have any serious performance regressions. > > Everything else looks ok, so I think I'll commit your patch with this change > soon if there are no objections.
Sounds good. Please add a note that this patch is a candidate for the 9.1 branch when you apply it. Alex > > Vadim > > >> + alu_type = CF_OP_ALU; >> + } >> + >> + emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT, alu_type); >> >> r600_bytecode_add_cfinst(ctx->bc, CF_OP_JUMP); >> >> > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev