Comments inline below vvvvv


Tomash Brechko wrote:
Consider this piece of code:

    extern int v;
void
    f(int set_v)
    {
      if (set_v)
        v = 1;
    }

    f:
            pushl   %ebp
            movl    %esp, %ebp
            cmpl    $0, 8(%ebp)
            movl    $1, %eax
            cmove   v, %eax        ; load (maybe)
            movl    %eax, v        ; store (always)
            popl    %ebp
            ret

Note the last unconditional store to v.  Now, if some thread would
modify v between our load and store (acquiring the mutex first), then
we will overwrite the new value with the old one (and would do that in
a thread-unsafe manner, not acquiring the mutex).

So, do the calls to f(0) require the mutex, or it's a GCC bug?

The "unintended write-access" optimization is a massive headache for developers of multi-threaded code.


The problem here is the mandatory write access to a memory location for which the as-written code path does not indicate a write memory access should occur.

This is a tricky one, optimizations which have the effect of causing an "unintended write access to some resource" when the code path does not intend this to happen crosses a line IMHO.

I think that GCC should understand where that line is and have a compile time parameter to configure if that line maybe crossed. Its a matter for debate as to what the default should be and/or -O6 should allow that line to be crossed, but having no mechanism to control it is the real bummer.

Even if the interpretation offered of the C language standards specification says the line maybe be crossed, from a practical point of view this is one aspect of optimization that a developer would want to have complete control over.

So much control that I would also like to see a pair of __attribute__((optimization_hint_keywords)) attached to the variable declaration to provide fine grain control. Such a solution to the problem would keep everybody happy.


Here are some pieces from C99:

...SNIP...
Sec 3.1 par 4: NOTE 3 Expressions that are not evaluated do not access
               objects.

Hmm... on this point there can be a problem. There are 2 major types of access read from memory (load) and write to memory (store). It is very possible to end up performing an optimistic read; only to throw away the value contained due to a compare/jump. This is usually considered a safe optimization.

But reading the statement above as-is and in the context of this problem might make some believe this "optimistic read" optimization is breaking the rules.


Maybe in GCC there should be C99 adherence levels :

strict mode: Where this C99 clause is adhered to, but this is much like compiling code without optimization, like when debugging. Since during debugging you always want nice clear per line / per expression separation so you can walk through execution with a debugger.

may optimize read access mode: This is the normal case for optimization, where you might interleave a 'compare reg with immediate' and a 'load from memory', then perform a 'conditional branch' that ends up at code that never uses the value loaded from memory. The only rare case this is a problem is where a read from special memory, but volatile in GCC exists for that or you could move all accesses to that memory away from regular C language syntax and into a function call.

may optimize read and write access mode: This is the problem case you are seeing. Same as the mode above but also permits the unintended write access, but only to write back the same value as before (based on the compiler's thread naive perception of execution at least!).



So, could someone explain me why this GCC optimization is valid, and,
if so, where lies the boundary below which I may safely assume GCC
won't try to store to objects that aren't stored to explicitly during
particular execution path?  Or maybe the named bug report is valid
after all?

As has been pointed out by others there is no specification on what happens between threads.


Your route out of this problem is to write your own implementation of:

atomic_int_set(int *ptr, int value);

Which always uses an atomic single instruction store. Which is thread-safe with respect to ensuring that no other concurrent read or write to that location will ever see a corrupted value. Where a corrupted value in this case would be some value other than "the previous value of 'v'" and "the value of '1'" you are setting, also that once a concurrent access with "the value of '1'" is first obversed, it will not be possible to observe the previous value on a subsequent read (the value doesn't flap about once it changes, it changes for good).

if (set_v)
  atomic_int_set(&v, 1);



By doing the above you are programatically dictating the method of thread-safety in 2 directions.

One direction in terms of something that is agreeable with a compiler and something it can't optimize/change. By using a basic function call invocation to an external symbol the compiler has no room to be able to think about optimization. Since the compiler does not know the side effects that calling this external symbol may have. So it can't reorder this operation either, so it occurred exactly at the moment your code says it should.

The other direction is in terms of the target CPU assembler instructions and architecture. The implementation of your atomic_int_set() will be fixed by expressing the operation directly in assembler (to be sure it always will use a single instruction move register to memory).


It would also be true that you should also have a method for reading the current value of 'v' in an atomic way.

This may also mean you have to create a function:

extern int atomic_int_get(int *);

For the purpose of obtaining the current value. Yes you have to apply the same care when reading the value as you do with setting it.



NB Marking the variable 'volatile' does not mean anything useful in the situation you are in. The exact meaning of what 'volatile' is can be a problem between compilers, but in the case of GCC it can stop the re-ordering and the caching of value in register aspect of your entire problem. But it will never enforce the method used to perform the load/store, not will it (at this time) stop the unintended write-access. Although in the case of an aligned integer of natural bitwidth it is somewhat difficult for the compiler to do the wrong thing on most architectures, as the most efficient instruction is the atomic load/store between register and memory.


Darryl

Reply via email to