On Mon, Oct 22, 2007 at 14:53:41 +0100, Dave Korn wrote:
> The optimisation the compiler is making here is a big win in normal
> code, you wouldn't want to disable it unless absolutely necessary;
> to be precise, you wouldn't want to automatically disable it for
> every loop and variable in a program that used -fopenmp just because
> /some/ of the variables in that program couldn't be safely accessed
> that way.
I'd rather wish the optimization would be done differently.  Currently
we have:

                                         mem -> reg;
   loop                                  loop
     if (condition)    => optimize =>      if (condition)
       val -> mem;                           val -> reg;
                                         reg -> mem;


But it could use additional register and be:

                                         0 -> flag_reg;
                                         loop
                                           if (condition)
                                             val -> reg;
                                             1 -> flag_reg;
                                         if (flag_reg == 1)
                                           reg -> mem;


Note that by doing so we also eliminate all memory accesses when they
are not needed (when condition is never true), and memory bandwidth is
a major limiting factor nowadays.  Actually, for the very first code
piece of this thread I'd say that optimization


                                     mem -> reg;
   if (condition)   => optimize =>   if (condition)
     val -> mem;                       val -> reg;
                                     reg -> mem;

(there's no loop) is actually a counter-optimization even in
single-threaded case: we replace a branch, which surely has its costs,
with unconditional memory load and store, which cost much more.  Even
if branching would flush CPU pipeline even when jump destination is
already in the pipeline (is this the case?), memory load has its own
quite big cost plus the cost of flushing one line from the cache just
to perform single operation on mem.

So, why not use flag_reg and thus make GCC thread-aware for this case?
I read the article suggested by Andrew Haley, its main point is that
the compiler should be made thread-aware.  Making all shared objects
volatile is an overkill, and is more a trick rather than a solution.


-- 
   Tomash Brechko

Reply via email to