On 01/15/2015 12:13 AM, Jakub Jelinek wrote: > On Thu, Jan 15, 2015 at 07:46:18AM +0100, Richard Biener wrote: >>>> That last line means the compiler is free to delete a non-volatile >>>> asm with a memory clobber if that asm is not needed for dataflow. Or >>>> that is how I read it; it is trying to indicate that if you want to >>>> prevent the memory clobber from being deleted (together with the rest >>>> of the asm), you need to make the asm volatile. >>>> >>>> So as far as I can see the compiler can CSE two identical >>> non-volatile >>>> asms with memory clobber just fine. Older GCC (I tried 4.7.2) does >>> do >>>> this; current mainline doesn't. I think it should. >>> No, it should not CSE those two cases. That's simply wrong and if an >>> older version did that optimization, that's a bug. >> >> I think segher has a point here. If the asm with memory clobber would store >> to random memory and the point would be to preserve that then the whole >> distinction with volatile doesn't make much sense (after all without >> volatile we happily DCE such asm if the regular outputs are not needed). >> >> This doesn't mean 'memory' is a well-designed thing, of course. Just its >> effects are effectively limited to reads without volatile(?) > > Segher's mails talk about "memory" being a write but not read. > If we even can't agree on what non-volatile "memory" means, I think > we should treat it more conservatively, because every user (and there are > lots of people using non-volatile asm with "memory" in the wild) treats it > differently. Just trying to grep for a few: > glibc: > ./sysdeps/alpha/bits/atomic.h:# define atomic_full_barrier() __asm ("mb" : : > : "memory"); > ./sysdeps/alpha/bits/atomic.h:# define atomic_read_barrier() __asm ("mb" : : > : "memory"); > ./sysdeps/alpha/bits/atomic.h:# define atomic_write_barrier() __asm ("wmb" : > : : "memory"); > ./sysdeps/sparc/sparc32/bits/atomic.h:# define atomic_full_barrier() __asm > ("" ::: "memory") > ./sysdeps/powerpc/powerpc32/bits/atomic.h:# define atomic_read_barrier() > __asm ("lwsync" ::: "memory") > ./sysdeps/powerpc/powerpc32/bits/atomic.h:# define atomic_read_barrier() > __asm ("sync" ::: "memory") > ./sysdeps/powerpc/powerpc64/bits/atomic.h:#define atomic_read_barrier() > __asm ("lwsync" ::: "memory") > ./sysdeps/powerpc/bits/atomic.h:#define atomic_full_barrier() __asm ("sync" > ::: "memory") > ./sysdeps/powerpc/bits/atomic.h:#define atomic_write_barrier() __asm > ("eieio" ::: "memory") > ./sysdeps/generic/malloc-machine.h:# define atomic_full_barrier() __asm ("" > ::: "memory")
I think that it's uses like these -- which may well have been written by folks that also work on gcc -- that are proof that we have at least intended to support a memory clobber to be a full read+write barrier, and thus we must consider a memory clobber to be both a read and write. (The fact that all of these are automatically volatile and would never be CSEd is beside the point. If the semantics of a memory clobber differ based on the volatile flag on the asm, I think that would be too ill-defined to actually support.) In the interest of progressing wrt the current regression, I think that Jakub's patch should go in as-is for now, and then we iterate on how we think the memory cse ought (or ought not) to occur. As for my own thoughts on whether two non-volatile asms with memory clobbers should be CSE'd, in absence of other stores to memory in between, are complicated and probably not well-formed. I'll think about it some more. r~