Jan Vesely <[email protected]> writes: > On Sat, 2016-04-16 at 10:19 -0500, Nicolai Hähnle wrote: >> On 15.04.2016 17:12, Francisco Jerez wrote: >> > >> > > >> > > > >> > > > > >> > > > > > >> > > > > > For a test doing almost the same thing but not relying on >> > > > > > unspecified >> > > > > > invocation ordering, see >> > > > > > "tests/spec/arb_shader_image_load_store/shader-mem- >> > > > > > barrier.c" -- It >> > > > > > would be interesting to see whether you can get it to >> > > > > > reproduce the GCN >> > > > > > coherency bug using different framebuffer size and modulus >> > > > > > parameters. >> > > > > I tried that, but couldn't reproduce. Whether I just wasn't >> > > > > thorough >> > > > > enough/"unlucky" or whether the in-order nature of the >> > > > > hardware and L1 >> > > > > cache behavior just makes it impossible to fail the shader- >> > > > > mem-barrier >> > > > > test, I'm not sure. >> > > > > >> > > > Now I'm curious about the exact nature of the bug ;), some sort >> > > > of >> > > > missing L1 cache-flushing which could potentially affect >> > > > dependent >> > > > invocations? >> > > I'm not sure I remember everything, to be honest. >> > > >> > > One issue that I do remember is that load/store by default go >> > > through >> > > L1, but atomics _never_ go through L1, no matter how you compile >> > > them. >> > > This means that if you're working on two different images, one >> > > with >> > > atomics and the other without, then the atomic one will always >> > > behave >> > > coherently but the other one won't unless you explicitly tell it >> > > to. >> > > >> > > Now that I think about this again, there should probably be a >> > > shader-mem-barrier-style way to test for that particular issue in >> > > a way >> > > that doesn't depend on the specifics of the parallelization. >> > > Something >> > > like, in a loop: >> > > >> > > Thread 1: increasing imageStore into image 1 at location 1, >> > > imageLoad >> > > from image 1 location 2 >> > > >> > > Thread 2: same, but exchange locations 1 and 2 >> > > >> > > Both threads: imageAtomicAdd on the same location in image 2 >> > > >> > > Then each thread can check that _if_ the imageAtomicAdd detects >> > > the >> > > buddy thread operating in parallel, _then_ they must also observe >> > > incrementing values in the location that the buddy thread stores >> > > to. >> > > Does that sound reasonable? >> > > >> > Yeah, that sounds reasonable, but keep in mind that even if both >> > image >> > variables are marked coherent you cannot make assumptions about the >> > ordering of the image stores performed on image 1 relative to the >> > atomics performed on image 2 unless there is an explicit barrier in >> > between, which means that some level of L1 caching is legitimate >> > even in >> > that scenario (and might have some performance benefit over >> > skipping L1 >> > caching of coherent images altogether) -- That's in fact the way >> > that >> > the i965 driver implements coherent image stores: We just write to >> > L1 >> > and flush later on to the globally coherent L3 on the next >> > memoryBarrier(). >> Okay, adding the barrier makes sense. >> >> >> > >> > What about a test along the lines of the current coherency >> > test? Any >> > idea what's the reason you couldn't get it to reproduce the >> > issue? Is >> > it because threads with dependent inputs are guaranteed to be >> > spawned in >> > the same L1 cache domain as the threads that generated their inputs >> > or >> > something like that? >> From what I understand (though admittedly the documentation I have >> on >> this is not the clearest...), the hardware flushes the L1 cache >> automatically at the end of each shader invocation, so that >> dependent >> invocations are guaranteed to pick it up. > > The GCN whitepaper mentions both that the L1D cache is write-through > and sends data to L2 at the end all 64 WF stores (page 9), and that the > L1 cache writes back data at the end of wavefront or when barrier is > invoked (page 10). >
Interesting... In that case I wonder if there actually was a bug to test for or you could just call the non-coherent behavior of L1 writes a feature? ;) Nicolai, do I have your Acked-by for the revert? > Jan > >> >> Cheers, >> Nicolai >> _______________________________________________ >> Piglit mailing list >> [email protected] >> https://lists.freedesktop.org/mailman/listinfo/piglit
signature.asc
Description: PGP signature
_______________________________________________ Piglit mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/piglit
