Comments inline below vvvvv
Tomash Brechko wrote:
Consider this piece of code:
extern int v;
void
f(int set_v)
{
if (set_v)
v = 1;
}
f:
pushl %ebp
movl %esp, %ebp
cmpl $0, 8(%ebp)
movl $1, %eax
cmove v, %eax ; load (maybe)
movl %eax, v ; store (always)
popl %ebp
ret
Note the last unconditional store to v. Now, if some thread would
modify v between our load and store (acquiring the mutex first), then
we will overwrite the new value with the old one (and would do that in
a thread-unsafe manner, not acquiring the mutex).
So, do the calls to f(0) require the mutex, or it's a GCC bug?
The "unintended write-access" optimization is a massive headache for
developers of multi-threaded code.
The problem here is the mandatory write access to a memory location for
which the as-written code path does not indicate a write memory access
should occur.
This is a tricky one, optimizations which have the effect of causing an
"unintended write access to some resource" when the code path does not
intend this to happen crosses a line IMHO.
I think that GCC should understand where that line is and have a compile
time parameter to configure if that line maybe crossed. Its a matter
for debate as to what the default should be and/or -O6 should allow that
line to be crossed, but having no mechanism to control it is the real
bummer.
Even if the interpretation offered of the C language standards
specification says the line maybe be crossed, from a practical point of
view this is one aspect of optimization that a developer would want to
have complete control over.
So much control that I would also like to see a pair of
__attribute__((optimization_hint_keywords)) attached to the variable
declaration to provide fine grain control. Such a solution to the
problem would keep everybody happy.
Here are some pieces from C99:
...SNIP...
Sec 3.1 par 4: NOTE 3 Expressions that are not evaluated do not access
objects.
Hmm... on this point there can be a problem. There are 2 major types of
access read from memory (load) and write to memory (store). It is very
possible to end up performing an optimistic read; only to throw away the
value contained due to a compare/jump. This is usually considered a
safe optimization.
But reading the statement above as-is and in the context of this problem
might make some believe this "optimistic read" optimization is breaking
the rules.
Maybe in GCC there should be C99 adherence levels :
strict mode: Where this C99 clause is adhered to, but this is much like
compiling code without optimization, like when debugging. Since during
debugging you always want nice clear per line / per expression
separation so you can walk through execution with a debugger.
may optimize read access mode: This is the normal case for optimization,
where you might interleave a 'compare reg with immediate' and a 'load
from memory', then perform a 'conditional branch' that ends up at code
that never uses the value loaded from memory. The only rare case this
is a problem is where a read from special memory, but volatile in GCC
exists for that or you could move all accesses to that memory away from
regular C language syntax and into a function call.
may optimize read and write access mode: This is the problem case you
are seeing. Same as the mode above but also permits the unintended
write access, but only to write back the same value as before (based on
the compiler's thread naive perception of execution at least!).
So, could someone explain me why this GCC optimization is valid, and,
if so, where lies the boundary below which I may safely assume GCC
won't try to store to objects that aren't stored to explicitly during
particular execution path? Or maybe the named bug report is valid
after all?
As has been pointed out by others there is no specification on what
happens between threads.
Your route out of this problem is to write your own implementation of:
atomic_int_set(int *ptr, int value);
Which always uses an atomic single instruction store. Which is
thread-safe with respect to ensuring that no other concurrent read or
write to that location will ever see a corrupted value. Where a
corrupted value in this case would be some value other than "the
previous value of 'v'" and "the value of '1'" you are setting, also that
once a concurrent access with "the value of '1'" is first obversed, it
will not be possible to observe the previous value on a subsequent read
(the value doesn't flap about once it changes, it changes for good).
if (set_v)
atomic_int_set(&v, 1);
By doing the above you are programatically dictating the method of
thread-safety in 2 directions.
One direction in terms of something that is agreeable with a compiler
and something it can't optimize/change. By using a basic function call
invocation to an external symbol the compiler has no room to be able to
think about optimization. Since the compiler does not know the side
effects that calling this external symbol may have. So it can't reorder
this operation either, so it occurred exactly at the moment your code
says it should.
The other direction is in terms of the target CPU assembler instructions
and architecture. The implementation of your atomic_int_set() will be
fixed by expressing the operation directly in assembler (to be sure it
always will use a single instruction move register to memory).
It would also be true that you should also have a method for reading the
current value of 'v' in an atomic way.
This may also mean you have to create a function:
extern int atomic_int_get(int *);
For the purpose of obtaining the current value. Yes you have to apply
the same care when reading the value as you do with setting it.
NB Marking the variable 'volatile' does not mean anything useful in the
situation you are in. The exact meaning of what 'volatile' is can be a
problem between compilers, but in the case of GCC it can stop the
re-ordering and the caching of value in register aspect of your entire
problem. But it will never enforce the method used to perform the
load/store, not will it (at this time) stop the unintended write-access.
Although in the case of an aligned integer of natural bitwidth it is
somewhat difficult for the compiler to do the wrong thing on most
architectures, as the most efficient instruction is the atomic
load/store between register and memory.
Darryl