volatile access optimization (C++ / x86_64)

Matt Godbolt Fri, 26 Dec 2014 12:32:52 -0800

Hi all,

I'm investigating ways to have single-threaded writers write to memory
areas which are then (very infrequently) read from another thread for
monitoring purposes. Things like "number of units of work done".


I initially modeled this with relaxed atomic operations. This
generates a "lock xadd" style instruction, as I can't convey that
there are no other writers.

As best I can tell, there's no memory order I can use to explain my
usage characteristics. Giving up on the atomics, I tried volatiles.
These are less than ideal as their power is less expressive, but in my
instance I am not trying to fight the ISA's reordering; just prevent
the compiler from eliding updates to my shared metrics.

GCC's code generation uses a "load; add; store" for volatiles, instead
of a single "add 1, [metric]".

http://goo.gl/dVzRSq has the example (which is also at the bottom of my email).

Is there a reason why (in principal) the volatile increment can't be
made into a single add? Clang and ICC both emit the same code for the
volatile and non-volatile case.

Thanks in advance for any thoughts on the matter,

Matt

--- example code ---
#include <atomic>
std::atomic<int> a(0);

void base_case() {
a++;
}

void relaxed() {
a.fetch_add(1, std::memory_order_relaxed);
}

void load_and_store_relaxed() {
  a.store(a.load(std::memory_order_relaxed) + 1, std::memory_order_relaxed);
}

void cast_as_int_ptr() {
  (*(int*)&a) ++;
}

void cast_as_volatile_int_ptr() {
  (*(volatile int*)&a) ++;
}

---example output (gcc490)---

base_case():
  lock addl $1, a(%rip)
  ret
relaxed():
  lock addl $1, a(%rip)
  ret
load_and_store_relaxed():
  movl a(%rip), %eax
  addl $1, %eax
  movl %eax, a(%rip)
  ret
cast_as_int_ptr():
  addl $1, a(%rip)
  ret
cast_as_volatile_int_ptr():
  movl a(%rip), %eax
  addl $1, %eax
  movl %eax, a(%rip)
  ret

volatile access optimization (C++ / x86_64)

Reply via email to