On Fri, Jan 10, 2014 at 5:19 PM, Jordan Justen <jljus...@gmail.com> wrote: > On Thu, Dec 19, 2013 at 1:40 PM, Matt Turner <matts...@gmail.com> wrote: >> Previously we simply considered two registers whose live ranges >> overlapped to interfere. Cases such as >> >> set A ------ >> ... | >> mov B, A -- | >> ... | B | A >> use B -- | >> ... | >> use A ------ >> >> would be considered to interfere, even though B is an unmodified copy of >> A whose live range fit wholly inside that of A. >> >> If no writes to A or B occur between the mov B, A and the use of B then >> we can safely coalesce them. >> >> Instead of removing MOV instructions, we make them NOPs and remove them >> at once after the main pass is finished in order to avoid recomputing >> live intervals (which are needed to perform the previous step). >> >> total instructions in shared programs: 1543768 -> 1513077 (-1.99%) >> instructions in affected programs: 951563 -> 920872 (-3.23%) >> GAINED: 46 >> LOST: 22 >> --- >> src/mesa/drivers/dri/i965/brw_fs.cpp | 69 >> ++++++++++++++++++++++++++++++++---- >> 1 file changed, 62 insertions(+), 7 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >> b/src/mesa/drivers/dri/i965/brw_fs.cpp >> index e4ac0a5..ad56b87 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >> @@ -2273,7 +2273,7 @@ fs_visitor::register_coalesce() >> int last_use[MAX_SAMPLER_MESSAGE_SIZE]; >> int next_ip = 0; >> >> - foreach_list_safe(node, &this->instructions) { >> + foreach_list(node, &this->instructions) { >> fs_inst *inst = (fs_inst *)node; >> >> int ip = next_ip; >> @@ -2299,8 +2299,39 @@ fs_visitor::register_coalesce() >> int var_to = live_intervals->var_from_reg(&inst->dst); >> >> if (live_intervals->vars_interfere(var_from, var_to) && >> - !inst->dst.equals(inst->src[0])) >> - continue; >> + !inst->dst.equals(inst->src[0])) { >> + >> + if (live_intervals->end[var_to] > live_intervals->end[var_from]) >> + continue; >> + >> + bool overwritten = false; >> + int scan_ip = -1; >> + >> + foreach_list(n, &this->instructions) { >> + fs_inst *scan_inst = (fs_inst *)n; >> + scan_ip++; >> + >> + if (scan_inst->is_control_flow()) { >> + overwritten = true; >> + break; >> + } >> + >> + if (scan_ip <= live_intervals->start[var_to]) >> + continue; >> + >> + if (scan_ip > live_intervals->end[var_to]) >> + break; >> + >> + if (scan_inst->dst.equals(inst->dst) || >> + scan_inst->dst.equals(inst->src[0])) { >> + overwritten = true; >> + break; >> + } >> + } >> + >> + if (overwritten) >> + continue; >> + } >> >> if (reg_from != inst->src[0].reg) { >> reg_from = inst->src[0].reg; >> @@ -2342,9 +2373,18 @@ fs_visitor::register_coalesce() >> if (live_channels_remaining) >> continue; >> >> + bool removed = false; >> for (int i = 0; i < src_size; i++) { >> - if (mov[i]) >> - mov[i]->remove(); >> + if (mov[i]) { >> + removed = true; >> + >> + mov[i]->opcode = BRW_OPCODE_NOP; >> + mov[i]->conditional_mod = BRW_CONDITIONAL_NONE; > > Is conditional_mod == BRW_CONDITIONAL_NONE something we should look > for before considering coalescing a mov?
I don't actually know what a mov with a conditional mod would do, and we don't emit a single one in all of shader-db, but it seems like we should be considering predication. >> + mov[i]->dst = reg_undef; >> + mov[i]->src[0] = reg_undef; >> + mov[i]->src[1] = reg_undef; >> + mov[i]->src[2] = reg_undef; >> + } >> } >> >> foreach_list(node, &this->instructions) { >> @@ -2366,11 +2406,26 @@ fs_visitor::register_coalesce() >> scan_inst->src[j].reg_offset = reg_to_offset[i]; >> } >> } >> - >> - progress = true; >> } >> } >> } >> + >> + if (removed) { >> + live_intervals->start[var_to] = MIN2(live_intervals->start[var_to], >> + >> live_intervals->start[var_from]); >> + live_intervals->end[var_to] = MAX2(live_intervals->end[var_to], >> + live_intervals->end[var_from]); >> + reg_from = -1; > > Do you think these are what lead to the lost programs? I don't think so. > I haven't been able to determine why they are needed starting with this > patch... Consider add vgrf3:F, vgrf1:F, vgrf2:F mov vgrf4:F, vgrf3:F mul vgrf5:F, vgrf5:F, vgrf4:F register coalescing turns this into add vgrf4:F, vgrf1:F, vgrf2:F mul vgrf5:F, vgrf5:F, vgrf4:F and now our live intervals are wrong, and calculating live intervals is expensive. Instead of recalculating live intervals after each successful iteration, we just fix var_to's start and end to keep them valid. To make sure that none of the other instructions are affected, we temporarily replace the MOV with a NOP and then after the pass is over remove all of the NOPs when it's safe to invalidate the live intervals. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev