On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez <curroje...@riseup.net> wrote: > Kristian Høgsberg <k...@bitplanet.net> writes: > >> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez <curroje...@riseup.net> >> wrote: >>> Neil Roberts <n...@linux.intel.com> writes: >>> >>>> Just a thought, would it be better to move this check into the >>>> eliminate_find_live_channel optimisation? That way it could catch >>>> sources that become immediates through later optimisations. One problem >>>> with this that I've seen before is that eliminating the >>>> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be >>>> eliminated because the copy propagation doesn't work. >>> >>> I believe in this particular case the BROADCAST instruction should >>> already be eliminated (by opt_algebraic), because its first source is an >>> immediate, so this optimization would in fact be redundant with >>> opt_algebraic+constant propagation if it weren't because the latter is >>> unable to handle scalar copies. >> >> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets >> eliminated because it's outside control flow >> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets >> reduced to a MOV in opt_algebraic because src0 is uniform (immediate >> constant). The problem then is the dst of the MOV has stride 0 which >> can_propagate_from() then bails on. >> >>> Another possibility that would likely >>> be more general than this (because it would handle cases in which, as >>> Neil said, the argument becomes an immediate only after optimization, >>> and because it would also avoid the issue with liveness analysis >>> extending the live ranges of scalar values incorrectly [1]) would be to >>> extend the destination register of the BROADCAST instructions to a >>> vector [2] and then add a case to the switch statement of >>> try_constant_propagate() so that the immediate MOV resulting from >>> opt_algebraic is propagated into surface instructions (see attachment). >> >> I wrote exactly this code to make constant propagate work, plus adding >> the extra opcodes to the switch statement. It works and I could >> certainly send that out after this, but >> >> 1) This doesn't mean we shouldn't do the if (src.file == IMM) >> shortcut. It saves the compiler a bit of work in the very common case >> of >> non-indirect buffer access. >> >> 2) I'm not even sure it makes sense to extend copy-propagation to do >> this (which is why I went back to just the IMM test). Anything that >> would be an immediate at this point should be an immediate, if not >> we're missing something in nir. >> > Still this doesn't address the root of the problem, which is that > emit_uniformize() emits scalar code that the rest of the compiler is not > able to handle properly.
This is not a case of papering over the root cause, it's about not creating the root cause in the first place. The output of emit_uniformize() always ends up as either a surface or a sampler index, which we only look at later in the generator. There are no other cases where the result of emit_uniformize() might be part of an expression that we can copy propagate or otherwise optimize. If the input to emit_uniformize() isn't an immediate where it could be, nir optimization needs fixing. So if we add these two lines to emit_uniformize() to pass immediates straight through, we avoid generating code that we have to extend the copy prop pass to handle. Kristian > Re-implementing a special case of the > optimization already done by opt_algebraic() might have helped in this > specific case, but it won't help when the argument is of some other kind > of uniform value which are also frequently encountered in practice and > could also be copy-propagated, and it won't help avoid the liveness > analysis bug [1] in cases where the argument is not an immediate (the > bug is reported to break some thousands SSBO dEQP tests, although I > don't know what fraction of them actually use non-constant indexing). > The alternative solution I provided patches for (and you seem to have > implemented yourself independently) would address all these issues at > once. > >> Kristian >> >>>> I made this patch a while ago but I never posted it anywhere because >>>> it's a of a kludge and it would probably be better to fix the copy >>>> propagation: >>>> >>>> https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa >>>> >>> Heh, yeah, I'd rather fix copy propagation instead, which I believe will >>> become much easier with the use-def-chain analysis pass I'm working on. >>> >>>> Either way though I don't think it would do any harm to have Kristian's >>>> patch as well even if we did improve elimintate_find_live_channel so it >>>> is: >>>> >>>> Reviewed-by: Neil Roberts <n...@linux.intel.com> >>>> >>>> - Neil >>>> >>>> Kristian Høgsberg Kristensen <k...@bitplanet.net> writes: >>>> >>>>> An immdiate is already uniform so just return it up front. Without this, >>>>> brw_fs_surface_builder ends up passing immediate surface indices through >>>>> SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't >>>>> constant propagate out of, and further, we don't constant propagate into >>>>> the typed/untype read/write opcodes at all. The end result is that all >>>>> typed/untyped read/write/atomics end up as indirect sends. >>>>> >>>>> Code generation should always produce either an immediate or an actual >>>>> indirect surface index, so we can fix this by just special casing >>>>> immediates in emit_uniformize. >>>>> >>>>> Signed-off-by: Kristian Høgsberg Kristensen <k...@bitplanet.net> >>>>> --- >>>>> src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++ >>>>> 1 file changed, 3 insertions(+) >>>>> >>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>> b/src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>> index df10a9d..98ce71e 100644 >>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>> @@ -394,6 +394,9 @@ namespace brw { >>>>> const dst_reg chan_index = >>>>> component(vgrf(BRW_REGISTER_TYPE_UD), 0); >>>>> const dst_reg dst = component(vgrf(src.type), 0); >>>>> >>>>> + if (src.file == IMM) >>>>> + return src; >>>>> + >>>>> ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index); >>>>> ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index); >>>>> >>>>> -- >>>>> 2.6.2 >>>>> >>>>> _______________________________________________ >>>>> mesa-dev mailing list >>>>> mesa-dev@lists.freedesktop.org >>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >>>> _______________________________________________ >>>> mesa-dev mailing list >>>> mesa-dev@lists.freedesktop.org >>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >>> >>> >>> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097085.html >>> [2] >>> http://lists.freedesktop.org/archives/mesa-dev/attachments/20151014/25dd38dc/attachment.patch >>> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev