On Fri, Dec 26, 2014 at 4:26 PM, Andrew Haley <a...@redhat.com> wrote: > On 26/12/14 20:32, Matt Godbolt wrote: >> Is there a reason why (in principal) the volatile increment can't be >> made into a single add? Clang and ICC both emit the same code for the >> volatile and non-volatile case. > > Yes. Volatiles use the "as if" rule, where every memory access is as > written. a volatile increment is defined as a load, an increment, and > a store.
That makes sense to me from a logical point of view. My understanding though is the volatile keyword was mainly used when working with memory-mapped devices, where memory loads and stores could not be elided. A single-instruction load-modify-write like "increment [addr]" adheres to these constraints even though it is a single instruction. I realise my understanding could be wrong here! If not though, both clang and icc are taking a short-cut that may puts them into non-compliant state. > If you want single atomic increment, atomics are what you > should use. If you want an increment to be written to memory, use a > store barrier after the increment. Thanks. I realise I was unclear in my original email. I'm really looking for a way to say "do a non-lock-prefixed increment". Atomics are too strong and enforce a bus lock. Doing a store barrier after the increment also appears heavy-handed: while I wish for eventual consistency with memory, I do not require it. I do however need the compiler to not move or elide my increment. At the moment I think the best I can do is to use an inline assembly version of the increment which prevents GCC from doing any optimisation upon it. That seems rather ugly though, and if anyone has any better suggestions I'd be very grateful. To give a concrete example: uint64_t num_done = 0; void process_work() { /* does something somewhat expensive */} void worker_thread(int num_work) { for (int i = 0; i < num_work; ++i) { process_work(); num_done++; // ideally a relaxed atomic increment here } } void reporting_thread() { while(true) { sleep(60); printf("worker has done %d\n", num_done); // ideally a relaxed read here } } In the non-atomic case above, no locked instructions are used. Given enough information about what process_work() does, the compiler can realise that num_done can be added to outside of the loop (num_done += num_work); which is the part I'd like to avoid. By making the int atomic and using relaxed, I get this guarantee but at the cost of a "lock addl". Thanks in advance for any ideas, Matt