On Monday, March 19, 2018 at 9:30:39 AM UTC-7, thepud...@gmail.com wrote: > > Hi Ian, > > I know you were not giving any type of definitive treatise on how go > treats atomics across different processors... > > but is a related aspect restricting instruction reordering by the compiler > itself? > > Yes, the compiler needs to treat atomic loads differently from normal loads with respect to any instruction reordering it does. So although *p and atomic.LoadUint32(p) both compile to a single MOVL on amd64, internally the compiler represents those two operations differently.
> I don't know what the modern go compiler does at this point, but I think > at least circa go 1.5 there was a nop function that seemed to be used to > help prevent the compiler from inlining and then doing instruction > re-ordering (first snippet below), and I think I've seen you make related > comments more recently (e.g., FreeBSD atomics discussion snippet I included > at the end of this post)? > > I haven't followed the more recent atomics related changes (including it > seems in 1.10 there might have been some work around intrinsics such as CL > 28076: "cmd/compile: intrinsify sync/atomic for amd64"?)... > > And yes, on the one hand the answer is "respect the memory model and get a > clean report from the race detector, etc., etc."... but of course sometimes > the performance aspect of the current compiler does matter beyond just mere > natural curiosity about how the go compiler does what it does (where > performance was the context I had looked at this more closely in the past). > > Two related snippets: > > ==================================================== > from go 1.5 > https://github.com/golang/go/blob/release-branch.go1.5/src/runtime/atomic_amd64x.go#L11 > ==================================================== > // The calls to nop are to keep these functions from being inlined. > // If they are inlined we have no guarantee that later rewrites of the > // code by optimizers will preserve the relative order of memory accesses. > > //go:nosplit > func atomicload(ptr *uint32) uint32 { > nop() > return *ptr > } > ==================================================== > > ==================================================== > Ian Lance Taylor response to question on FreeBSD atomics discussion on > golang-dev: https://groups.google.com/forum/#!topic/golang-dev/f3PS8hp4Jfs > ==================================================== > > *> The second issue I have is translating FreeBSD atomic operations to > runtime * > *> atomic ops. * > *> If I understand it correctly then atomic_load_acq_32 has weaker > requirements * > *> compared to runtime/internal/atomic.Load. * > *> On x86 the FreeBSD variant is just a compiler barrier to prevent it * > *> re-oredering instructions. * > > The Go compiler does reorder instructions. But it doesn't reorder > instructions across a non-inlined function call. On x86 a simple > memory load suffices for atomic.Load because x86 has a fairly strict > memory order in any case. Most other processors are more lenient, and > require more work in the atomic operation. > > ==================================================== > > --thepudds > > On Monday, March 19, 2018 at 1:55:07 AM UTC-4, Ian Lance Taylor wrote: >> >> On Sun, Mar 18, 2018 at 9:47 PM, shivaram via golang-nuts >> <golan...@googlegroups.com> wrote: >> > >> > I noticed that internally, the language implementation seems to rely on >> the >> > atomicity of reads to single-word values: >> > >> > >> https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/runtime/chan.go#L160 >> >> >> In the machine level, words like "atomicity" are overloaded with >> different meanings. I think what you are saying is that the runtime >> package is assuming that a load of a machine word will never read an >> interleaving of two different store of a machine word. It will always >> read the value written by a single store, though exactly which store >> it sees is unknown. This is true on all the processors that Go >> supports. >> >> >> > As I understand it, this atomicity is provided by the cache coherence >> > algorithms of modern architectures. Accordingly, the implementations in >> > sync.atomic of word-sized loads (e.g., LoadUint32 on 386 and LoadUint64 >> on >> > amd64) use ordinary MOV instructions: >> > >> > >> https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_386.s#L146 >> >> > >> > >> https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_amd64.s#L103 >> >> > >> > However, word-sized stores on these architectures use special >> instructions: >> > >> > >> https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_amd64.s#L133 >> >> > >> > Given that the APIs being implemented don't provide any global ordering >> > guarantees, what's the reason they can't be implemented solely with >> MOV? >> >> You are not giving the correct reason for why atomic.LoadUint32 and >> LoadUint64 can use ordinary MOV instructions on x86 processors. The >> LoadUint32, etc., functions guarantee much more than that they read a >> value that is not an interleaving a multiple writes. They are also >> load-acquire operations, meaning that when the function completes, the >> caller will see not only the value that was loaded but also all other >> values that some other processor core wrote before writing to the >> address being loaded (assuming the write was done using StoreUint32, >> etc.). It happens that on x86 you can implement load-acquire using a >> simple MOV instruction. Most other multicore processors use a more >> complex memory model, and their sync/atomic implementations are >> accordingly more complex. >> >> Ian >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.