On Wed, Oct 4, 2017 at 8:35 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott <cwabbo...@gmail.com> wrote: >> >> This won't completely solve the problem. For example, what if you >> hoist the assignment to color2 outside the loop? >> >> vec4 color2; >> while (1) { >> vec4 color = texture(); >> color2 = color * 2; >> if (...) { >> break; >> } >> } >> gl_FragColor = color2; >> >> >> Now the definition still dominates the use, even with the modified >> control-flow graph, and you have the same problem > > > Curro had me convinced that some detail of the liveness analysis pass saved > us here but now I can't remember what. :-( > >> >> The real problem is >> that the assignment to color2 is really a conditional assignment: if >> we're going channel-by-channel, it's not, but if you consider the >> *whole* register at the same time, it is. To really fix the problem, >> you need to model exactly what the machine actually does: you need to >> insert "fake" edges like these, that model the jumps that the machine >> can take, and you need to make every assignment a conditional >> assignment (i.e. it doesn't kill the register). It's probably not as >> bad with Curro's patch on top, though. Also, once you do this you can >> make register allocation more accurate by generating interferences >> from the liveness information directly instead of from the intervals. >> >> One thing I've thought about is, in addition to maintaining this >> "whole-vector" view of things, is to maintain a "per-channel" liveness >> that doesn't use the extra edges, partial definitions etc. and then >> use the "per-channel view" to calculate interference when the channels >> always line up. > > > Yes, we've considered that and it's a good idea. However, I'm trying to fix > bugs right now, not write the world's best liveness analysis pass. :-)
That's fair, although just implementing the first bit shouldn't be too hard. > >> >> On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand <ja...@jlekstrand.net> >> wrote: >> > Shader-db results on Sky Lake: >> > >> > total instructions in shared programs: 12955125 -> 12953698 (-0.01%) >> > instructions in affected programs: 55956 -> 54529 (-2.55%) >> > helped: 6 >> > HURT: 38 >> > >> > All of the hurt programs were hurt by exactly one instruction because >> > this patch affects copy propagation. Most of the helped instructions >> > came from a single orbital explorer shader that was helped by 14.26% >> > >> > Cc: mesa-sta...@lists.freedesktop.org >> > --- >> > src/intel/compiler/brw_cfg.cpp | 37 >> > +++++++++++++++++++++++++++++++++++-- >> > 1 file changed, 35 insertions(+), 2 deletions(-) >> > >> > diff --git a/src/intel/compiler/brw_cfg.cpp >> > b/src/intel/compiler/brw_cfg.cpp >> > index fad12ee..d8bf725 100644 >> > --- a/src/intel/compiler/brw_cfg.cpp >> > +++ b/src/intel/compiler/brw_cfg.cpp >> > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions) >> > assert(cur_while != NULL); >> > cur->add_successor(mem_ctx, cur_while); >> > >> > + /* We also add the next block as a successor of the break. If >> > the >> > + * break is predicated, we need to do this because the break >> > may not >> > + * be taken. If the break is not predicated, we add it anyway >> > so >> > + * that our live intervals computations will operate as if the >> > break >> > + * may or may not be taken. Consider the following example: >> > + * >> > + * vec4 color2; >> > + * while (1) { >> > + * vec4 color = texture(); >> > + * if (...) { >> > + * color2 = color * 2; >> > + * break; >> > + * } >> > + * } >> > + * gl_FragColor = color2; >> > + * >> > + * In this case, the definition of color2 dominates the use >> > because >> > + * the loop only has the one exit. This means that the live >> > range >> > + * interval for color2 goes from the statement in the if to >> > it's use >> > + * below the loop. Now suppose that the texture operation has >> > a >> > + * header register that gets assigned one of the registers >> > used for >> > + * color2. If the loop condition is non-uniform and some of >> > the >> > + * threads will take the break and others will continue. In >> > this >> > + * case, the next pass through the loop, the WE_all setup of >> > the >> > + * header register will stomp the disabled channels of color2 >> > and >> > + * corrupt the value. >> > + * >> > + * This same problem can occur if you have a mix of 64, 32, >> > and >> > + * 16-bit registers because the channels do not line up or if >> > you >> > + * have a SIMD16 program and the first half of one value >> > overlaps the >> > + * second half of the other. To solve it, we simply treat the >> > break >> > + * as if it may also continue on because some of the threads >> > may >> > + * continue on. >> > + */ >> > next = new_block(); >> > - if (inst->predicate) >> > - cur->add_successor(mem_ctx, next); >> > + cur->add_successor(mem_ctx, next); >> > >> > set_next_block(&cur, next, ip); >> > break; >> > -- >> > 2.5.0.400.gff86faf >> > >> > _______________________________________________ >> > mesa-dev mailing list >> > mesa-dev@lists.freedesktop.org >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev