New patch on the list. On Wed, Oct 4, 2017 at 7:42 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote:
> On Wed, Oct 4, 2017 at 5:35 PM, Jason Ekstrand <ja...@jlekstrand.net> > wrote: > >> On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott <cwabbo...@gmail.com> >> wrote: >> >>> This won't completely solve the problem. For example, what if you >>> hoist the assignment to color2 outside the loop? >>> >>> vec4 color2; >>> while (1) { >>> vec4 color = texture(); >>> color2 = color * 2; >>> if (...) { >>> break; >>> } >>> } >>> gl_FragColor = color2; >>> >>> >>> Now the definition still dominates the use, even with the modified >>> control-flow graph, and you have the same problem >> >> >> Curro had me convinced that some detail of the liveness analysis pass >> saved us here but now I can't remember what. :-( >> >> >>> The real problem is >>> that the assignment to color2 is really a conditional assignment: if >>> we're going channel-by-channel, it's not, but if you consider the >>> *whole* register at the same time, it is. To really fix the problem, >>> you need to model exactly what the machine actually does: you need to >>> insert "fake" edges like these, that model the jumps that the machine >>> can take, and you need to make every assignment a conditional >>> assignment (i.e. it doesn't kill the register). It's probably not as >>> bad with Curro's patch on top, though. Also, once you do this you can >>> make register allocation more accurate by generating interferences >>> from the liveness information directly instead of from the intervals. >>> >>> One thing I've thought about is, in addition to maintaining this >>> "whole-vector" view of things, is to maintain a "per-channel" liveness >>> that doesn't use the extra edges, partial definitions etc. and then >>> use the "per-channel view" to calculate interference when the channels >>> always line up. >>> >> >> Yes, we've considered that and it's a good idea. However, I'm trying to >> fix bugs right now, not write the world's best liveness analysis pass. :-) >> > > You're correct, as usual... I've inspected the result of liveness anlaysis > and we do indeed get it wrong. I'll come up with something less bogus. > > >> On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand <ja...@jlekstrand.net> >>> wrote: >>> > Shader-db results on Sky Lake: >>> > >>> > total instructions in shared programs: 12955125 -> 12953698 >>> (-0.01%) >>> > instructions in affected programs: 55956 -> 54529 (-2.55%) >>> > helped: 6 >>> > HURT: 38 >>> > >>> > All of the hurt programs were hurt by exactly one instruction because >>> > this patch affects copy propagation. Most of the helped instructions >>> > came from a single orbital explorer shader that was helped by 14.26% >>> > >>> > Cc: mesa-sta...@lists.freedesktop.org >>> > --- >>> > src/intel/compiler/brw_cfg.cpp | 37 ++++++++++++++++++++++++++++++ >>> +++++-- >>> > 1 file changed, 35 insertions(+), 2 deletions(-) >>> > >>> > diff --git a/src/intel/compiler/brw_cfg.cpp >>> b/src/intel/compiler/brw_cfg.cpp >>> > index fad12ee..d8bf725 100644 >>> > --- a/src/intel/compiler/brw_cfg.cpp >>> > +++ b/src/intel/compiler/brw_cfg.cpp >>> > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions) >>> > assert(cur_while != NULL); >>> > cur->add_successor(mem_ctx, cur_while); >>> > >>> > + /* We also add the next block as a successor of the break. >>> If the >>> > + * break is predicated, we need to do this because the break >>> may not >>> > + * be taken. If the break is not predicated, we add it >>> anyway so >>> > + * that our live intervals computations will operate as if >>> the break >>> > + * may or may not be taken. Consider the following example: >>> > + * >>> > + * vec4 color2; >>> > + * while (1) { >>> > + * vec4 color = texture(); >>> > + * if (...) { >>> > + * color2 = color * 2; >>> > + * break; >>> > + * } >>> > + * } >>> > + * gl_FragColor = color2; >>> > + * >>> > + * In this case, the definition of color2 dominates the use >>> because >>> > + * the loop only has the one exit. This means that the live >>> range >>> > + * interval for color2 goes from the statement in the if to >>> it's use >>> > + * below the loop. Now suppose that the texture operation >>> has a >>> > + * header register that gets assigned one of the registers >>> used for >>> > + * color2. If the loop condition is non-uniform and some of >>> the >>> > + * threads will take the break and others will continue. In >>> this >>> > + * case, the next pass through the loop, the WE_all setup of >>> the >>> > + * header register will stomp the disabled channels of >>> color2 and >>> > + * corrupt the value. >>> > + * >>> > + * This same problem can occur if you have a mix of 64, 32, >>> and >>> > + * 16-bit registers because the channels do not line up or >>> if you >>> > + * have a SIMD16 program and the first half of one value >>> overlaps the >>> > + * second half of the other. To solve it, we simply treat >>> the break >>> > + * as if it may also continue on because some of the threads >>> may >>> > + * continue on. >>> > + */ >>> > next = new_block(); >>> > - if (inst->predicate) >>> > - cur->add_successor(mem_ctx, next); >>> > + cur->add_successor(mem_ctx, next); >>> > >>> > set_next_block(&cur, next, ip); >>> > break; >>> > -- >>> > 2.5.0.400.gff86faf >>> > >>> > _______________________________________________ >>> > mesa-dev mailing list >>> > mesa-dev@lists.freedesktop.org >>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>> >> >> >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev