Hi all, I'm investigating ways to have single-threaded writers write to memory areas which are then (very infrequently) read from another thread for monitoring purposes. Things like "number of units of work done".
I initially modeled this with relaxed atomic operations. This generates a "lock xadd" style instruction, as I can't convey that there are no other writers. As best I can tell, there's no memory order I can use to explain my usage characteristics. Giving up on the atomics, I tried volatiles. These are less than ideal as their power is less expressive, but in my instance I am not trying to fight the ISA's reordering; just prevent the compiler from eliding updates to my shared metrics. GCC's code generation uses a "load; add; store" for volatiles, instead of a single "add 1, [metric]". http://goo.gl/dVzRSq has the example (which is also at the bottom of my email). Is there a reason why (in principal) the volatile increment can't be made into a single add? Clang and ICC both emit the same code for the volatile and non-volatile case. Thanks in advance for any thoughts on the matter, Matt --- example code --- #include <atomic> std::atomic<int> a(0); void base_case() { a++; } void relaxed() { a.fetch_add(1, std::memory_order_relaxed); } void load_and_store_relaxed() { a.store(a.load(std::memory_order_relaxed) + 1, std::memory_order_relaxed); } void cast_as_int_ptr() { (*(int*)&a) ++; } void cast_as_volatile_int_ptr() { (*(volatile int*)&a) ++; } ---example output (gcc490)--- base_case(): lock addl $1, a(%rip) ret relaxed(): lock addl $1, a(%rip) ret load_and_store_relaxed(): movl a(%rip), %eax addl $1, %eax movl %eax, a(%rip) ret cast_as_int_ptr(): addl $1, a(%rip) ret cast_as_volatile_int_ptr(): movl a(%rip), %eax addl $1, %eax movl %eax, a(%rip) ret