On Fri, Oct 23, 2015 at 5:38 AM, Francisco Jerez <curroje...@riseup.net> wrote: > Kristian Høgsberg <k...@bitplanet.net> writes: > >> On Tue, Oct 20, 2015 at 11:56 AM, Francisco Jerez <curroje...@riseup.net> >> wrote: >>> Kristian Høgsberg <k...@bitplanet.net> writes: >>> >>>> On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez <curroje...@riseup.net> >>>> wrote: >>>>> Kristian Høgsberg <k...@bitplanet.net> writes: >>>>> >>>>>> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez <curroje...@riseup.net> >>>>>> wrote: >>>>>>> Neil Roberts <n...@linux.intel.com> writes: >>>>>>> >>>>>>>> Just a thought, would it be better to move this check into the >>>>>>>> eliminate_find_live_channel optimisation? That way it could catch >>>>>>>> sources that become immediates through later optimisations. One problem >>>>>>>> with this that I've seen before is that eliminating the >>>>>>>> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be >>>>>>>> eliminated because the copy propagation doesn't work. >>>>>>> >>>>>>> I believe in this particular case the BROADCAST instruction should >>>>>>> already be eliminated (by opt_algebraic), because its first source is an >>>>>>> immediate, so this optimization would in fact be redundant with >>>>>>> opt_algebraic+constant propagation if it weren't because the latter is >>>>>>> unable to handle scalar copies. >>>>>> >>>>>> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets >>>>>> eliminated because it's outside control flow >>>>>> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets >>>>>> reduced to a MOV in opt_algebraic because src0 is uniform (immediate >>>>>> constant). The problem then is the dst of the MOV has stride 0 which >>>>>> can_propagate_from() then bails on. >>>>>> >>>>>>> Another possibility that would likely >>>>>>> be more general than this (because it would handle cases in which, as >>>>>>> Neil said, the argument becomes an immediate only after optimization, >>>>>>> and because it would also avoid the issue with liveness analysis >>>>>>> extending the live ranges of scalar values incorrectly [1]) would be to >>>>>>> extend the destination register of the BROADCAST instructions to a >>>>>>> vector [2] and then add a case to the switch statement of >>>>>>> try_constant_propagate() so that the immediate MOV resulting from >>>>>>> opt_algebraic is propagated into surface instructions (see attachment). >>>>>> >>>>>> I wrote exactly this code to make constant propagate work, plus adding >>>>>> the extra opcodes to the switch statement. It works and I could >>>>>> certainly send that out after this, but >>>>>> >>>>>> 1) This doesn't mean we shouldn't do the if (src.file == IMM) >>>>>> shortcut. It saves the compiler a bit of work in the very common case >>>>>> of >>>>>> non-indirect buffer access. >>>>>> >>>>>> 2) I'm not even sure it makes sense to extend copy-propagation to do >>>>>> this (which is why I went back to just the IMM test). Anything that >>>>>> would be an immediate at this point should be an immediate, if not >>>>>> we're missing something in nir. >>>>>> >>>>> Still this doesn't address the root of the problem, which is that >>>>> emit_uniformize() emits scalar code that the rest of the compiler is not >>>>> able to handle properly. >>>> >>>> This is not a case of papering over the root cause, it's about not >>>> creating the root cause in the first place. The output of >>>> emit_uniformize() always ends up as either a surface or a sampler >>>> index, which we only look at later in the generator. There are no >>>> other cases where the result of emit_uniformize() might be part of an >>>> expression that we can copy propagate or otherwise optimize. If the >>>> input to emit_uniformize() isn't an immediate where it could be, nir >>>> optimization needs fixing. So if we add these two lines to >>>> emit_uniformize() to pass immediates straight through, we avoid >>>> generating code that we have to extend the copy prop pass to handle. >>>> >>> >>> Kristian, there are legitimate uses of emit_uniformize() in which the >>> argument is not an immediate but still can be optimized out later on -- >>> E.g. for images it will frequently be a uniform register, or a >>> non-constant expression calculated within uniform control flow (in which >>> case the FIND_LIVE_CHANNEL instruction emitted here will be reduced to a >>> constant MOV by eliminate_find_live_chaso nnel()). In such cases we still >>> want copy-propagation to kick in, but it wont because of the problem I >>> was talking about with scalar writes. Even if the instructions emitted >>> by emit_uniformize() cannot be optimized out, liveness analysis will >>> overestimate the live ranges of the temporaries used by >>> emit_uniformize() for the same reason, potentially causing the register >>> allocator to spill or run out of registers. >> >> Yeah, fair point. I went and took a look at non-constant surface index >> and sent out a new series that make that work better as well as ssbo >> write optimizations. I took out the imm shortcut and put in the >> generic optimizations. It makes non-const surface index a little >> better, but as I wrote in the cover letter, it still doesn't work that >> well as we don't propagate the surface index into the send >> instructions. >> >>> Anyway don't take me wrong, I'm not NAK-ing your patch or anything, I >>> just have the feeling that by fixing this more generally you could've >>> saved us [or most likely me ;)] work in the near future. >> >> I may be saving you some time, but I've been working on this for >> longer than I expected ;-) >> > Thanks, and sorry for the trouble. :)
And thanks for the reviews, pushed the series now. I'll leave scalar copy propagation to you then. Kristian >> Kristian >> >>>> Kristian >>>> >>>>> Re-implementing a special case of the >>>>> optimization already done by opt_algebraic() might have helped in this >>>>> specific case, but it won't help when the argument is of some other kind >>>>> of uniform value which are also frequently encountered in practice and >>>>> could also be copy-propagated, and it won't help avoid the liveness >>>>> analysis bug [1] in cases where the argument is not an immediate (the >>>>> bug is reported to break some thousands SSBO dEQP tests, although I >>>>> don't know what fraction of them actually use non-constant indexing). >>>>> The alternative solution I provided patches for (and you seem to have >>>>> implemented yourself independently) would address all these issues at >>>>> once. >>>>> >>>>>> Kristian >>>>>> >>>>>>>> I made this patch a while ago but I never posted it anywhere because >>>>>>>> it's a of a kludge and it would probably be better to fix the copy >>>>>>>> propagation: >>>>>>>> >>>>>>>> https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa >>>>>>>> >>>>>>> Heh, yeah, I'd rather fix copy propagation instead, which I believe will >>>>>>> become much easier with the use-def-chain analysis pass I'm working on. >>>>>>> >>>>>>>> Either way though I don't think it would do any harm to have Kristian's >>>>>>>> patch as well even if we did improve elimintate_find_live_channel so it >>>>>>>> is: >>>>>>>> >>>>>>>> Reviewed-by: Neil Roberts <n...@linux.intel.com> >>>>>>>> >>>>>>>> - Neil >>>>>>>> >>>>>>>> Kristian Høgsberg Kristensen <k...@bitplanet.net> writes: >>>>>>>> >>>>>>>>> An immdiate is already uniform so just return it up front. Without >>>>>>>>> this, >>>>>>>>> brw_fs_surface_builder ends up passing immediate surface indices >>>>>>>>> through >>>>>>>>> SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't >>>>>>>>> constant propagate out of, and further, we don't constant propagate >>>>>>>>> into >>>>>>>>> the typed/untype read/write opcodes at all. The end result is that >>>>>>>>> all >>>>>>>>> typed/untyped read/write/atomics end up as indirect sends. >>>>>>>>> >>>>>>>>> Code generation should always produce either an immediate or an actual >>>>>>>>> indirect surface index, so we can fix this by just special casing >>>>>>>>> immediates in emit_uniformize. >>>>>>>>> >>>>>>>>> Signed-off-by: Kristian Høgsberg Kristensen <k...@bitplanet.net> >>>>>>>>> --- >>>>>>>>> src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++ >>>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>>> >>>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>>>>>> b/src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>>>>>> index df10a9d..98ce71e 100644 >>>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>>>>>> @@ -394,6 +394,9 @@ namespace brw { >>>>>>>>> const dst_reg chan_index = >>>>>>>>> component(vgrf(BRW_REGISTER_TYPE_UD), 0); >>>>>>>>> const dst_reg dst = component(vgrf(src.type), 0); >>>>>>>>> >>>>>>>>> + if (src.file == IMM) >>>>>>>>> + return src; >>>>>>>>> + >>>>>>>>> ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index); >>>>>>>>> ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index); >>>>>>>>> >>>>>>>>> -- >>>>>>>>> 2.6.2 >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> mesa-dev mailing list >>>>>>>>> mesa-dev@lists.freedesktop.org >>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >>>>>>>> _______________________________________________ >>>>>>>> mesa-dev mailing list >>>>>>>> mesa-dev@lists.freedesktop.org >>>>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >>>>>>> >>>>>>> >>>>>>> [1] >>>>>>> http://lists.freedesktop.org/archives/mesa-dev/2015-October/097085.html >>>>>>> [2] >>>>>>> http://lists.freedesktop.org/archives/mesa-dev/attachments/20151014/25dd38dc/attachment.patch >>>>>>> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev