On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott <cwabbo...@gmail.com> wrote:
> This won't completely solve the problem. For example, what if you > hoist the assignment to color2 outside the loop? > > vec4 color2; > while (1) { > vec4 color = texture(); > color2 = color * 2; > if (...) { > break; > } > } > gl_FragColor = color2; > > > Now the definition still dominates the use, even with the modified > control-flow graph, and you have the same problem Curro had me convinced that some detail of the liveness analysis pass saved us here but now I can't remember what. :-( > The real problem is > that the assignment to color2 is really a conditional assignment: if > we're going channel-by-channel, it's not, but if you consider the > *whole* register at the same time, it is. To really fix the problem, > you need to model exactly what the machine actually does: you need to > insert "fake" edges like these, that model the jumps that the machine > can take, and you need to make every assignment a conditional > assignment (i.e. it doesn't kill the register). It's probably not as > bad with Curro's patch on top, though. Also, once you do this you can > make register allocation more accurate by generating interferences > from the liveness information directly instead of from the intervals. > > One thing I've thought about is, in addition to maintaining this > "whole-vector" view of things, is to maintain a "per-channel" liveness > that doesn't use the extra edges, partial definitions etc. and then > use the "per-channel view" to calculate interference when the channels > always line up. > Yes, we've considered that and it's a good idea. However, I'm trying to fix bugs right now, not write the world's best liveness analysis pass. :-) > On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand <ja...@jlekstrand.net> > wrote: > > Shader-db results on Sky Lake: > > > > total instructions in shared programs: 12955125 -> 12953698 (-0.01%) > > instructions in affected programs: 55956 -> 54529 (-2.55%) > > helped: 6 > > HURT: 38 > > > > All of the hurt programs were hurt by exactly one instruction because > > this patch affects copy propagation. Most of the helped instructions > > came from a single orbital explorer shader that was helped by 14.26% > > > > Cc: mesa-sta...@lists.freedesktop.org > > --- > > src/intel/compiler/brw_cfg.cpp | 37 ++++++++++++++++++++++++++++++ > +++++-- > > 1 file changed, 35 insertions(+), 2 deletions(-) > > > > diff --git a/src/intel/compiler/brw_cfg.cpp > b/src/intel/compiler/brw_cfg.cpp > > index fad12ee..d8bf725 100644 > > --- a/src/intel/compiler/brw_cfg.cpp > > +++ b/src/intel/compiler/brw_cfg.cpp > > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions) > > assert(cur_while != NULL); > > cur->add_successor(mem_ctx, cur_while); > > > > + /* We also add the next block as a successor of the break. If > the > > + * break is predicated, we need to do this because the break > may not > > + * be taken. If the break is not predicated, we add it anyway > so > > + * that our live intervals computations will operate as if the > break > > + * may or may not be taken. Consider the following example: > > + * > > + * vec4 color2; > > + * while (1) { > > + * vec4 color = texture(); > > + * if (...) { > > + * color2 = color * 2; > > + * break; > > + * } > > + * } > > + * gl_FragColor = color2; > > + * > > + * In this case, the definition of color2 dominates the use > because > > + * the loop only has the one exit. This means that the live > range > > + * interval for color2 goes from the statement in the if to > it's use > > + * below the loop. Now suppose that the texture operation has > a > > + * header register that gets assigned one of the registers > used for > > + * color2. If the loop condition is non-uniform and some of > the > > + * threads will take the break and others will continue. In > this > > + * case, the next pass through the loop, the WE_all setup of > the > > + * header register will stomp the disabled channels of color2 > and > > + * corrupt the value. > > + * > > + * This same problem can occur if you have a mix of 64, 32, and > > + * 16-bit registers because the channels do not line up or if > you > > + * have a SIMD16 program and the first half of one value > overlaps the > > + * second half of the other. To solve it, we simply treat the > break > > + * as if it may also continue on because some of the threads > may > > + * continue on. > > + */ > > next = new_block(); > > - if (inst->predicate) > > - cur->add_successor(mem_ctx, next); > > + cur->add_successor(mem_ctx, next); > > > > set_next_block(&cur, next, ip); > > break; > > -- > > 2.5.0.400.gff86faf > > > > _______________________________________________ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev