On Thu, Apr 21, 2016 at 11:15 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote: > On Thu, Apr 21, 2016 at 10:46 AM, Hans de Goede <hdego...@redhat.com> wrote: >> Hi, >> >> On 21-04-16 16:28, Ilia Mirkin wrote: >>> >>> On Thu, Apr 21, 2016 at 9:55 AM, Hans de Goede <hdego...@redhat.com> >>> wrote: >>>> >>>> combineLd/St would combine, i.e. : >>>> >>>> st u32 # g[$r2+0x0] $r2 >>>> st u32 # g[$r2+0x4] $r3 >>>> >>>> into: >>>> >>>> st u64 # g[$r2+0x0] $r2d >>>> >>>> But this is only valid if r2 contains an 8 byte aligned address, >>>> which is unknown. >>>> >>>> This commit checks for src0 dim 0 not being indirect when combining >>>> loads / stores as combining indirect loads / stores may break alignment >>>> rules. >>> >>> >>> I believe the assumption is that all indirect addresses are 16-byte >>> aligned. This works out for GL, I think. Although hm... I wonder what >>> happens if you have a >>> >>> layout (std430) buffer foo { >>> int x[16]; >>> } >>> >>> And you access x[i], x[i+1], and i == 1. I think we end up doing a ton >>> of size-based validation which might avoid the problem. >>> >>> My concern is that now constbufs will get the same treatment, and for >>> constbufs the alignment is always 16 :( >>> >>> What do you think? Just drop those, or add extra conditionals to allow >>> it for constbufs? >> >> >> I'm not sure we've the alignment guarantee for constbufs, IIRC we lower >> const buf accesses to be indirect because we want to provide more then 8 >> UBO-s, >> right ? So we read the offset from NVC0_CB_AUX_BUF_INFO and then end up with >> e.g.: >> >> ld u64 r2d c7[r1+0x0] >> >> Where r1 contains the offset of the user-buf. But what if the user is >> somehow >> indirectly accessing the userbuf, then we will have added that indirect >> offset >> to r1, and we can no longer assume that we can safely merge the loads >> without >> breaking alignment rules. >> >> I hope I'm making sense here, I'm still a bit unsure about the details how >> this all works. > > Yes, that can happen, except... it can't. The addresses are 64-bit, > and this only happens on Kepler+, which limits constbuf loads to > 64-bit at a time (no 128-bit loads for some unknown reason -- I've > tried to enable it, and it indeed doesn't work, even though nvdisasm > is perfectly happy with it). So ... it can't happen. But in principle > you're right :)
The only way this can happen is if I'm wrong about how std140 packing works. Which isn't too crazy of a bet to make :) The question is what happens if you have layout (std140) uniform buffer { int x[16]; } Is that laid out with a stride of 4, or 16? If 16, we're all good. If 4, then you're right and we can't make that guarantee. I'm pretty sure the answer is 16, since otherwise I don't see how the TGSI would work out. And I'm fairly sure we get this sort of thing right. [And std430 packing can only be used on SSBO's, not on UBO's.] -ilia _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev