https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114532

--- Comment #11 from David Brown <david at westcontrol dot com> ---
(In reply to Zhaohaifeng from comment #8)
> (In reply to David Brown from comment #7)
> > (In reply to Xi Ruoyao from comment #6)

> > Anyway, I cannot see any reason while -fno-common should result in the
> > slower run-times the OP saw (though I have only looked at current gcc
> > versions).  I haven't seen any differences in the code generated for
> > -fcommon and -fno-common on the x86-64.  And my experience on other targets
> > is that -fcommon allows optimisations that cannot be done with -fno-common,
> > thus giving faster code.
> > 
> > I have not, however, seen the OP's real code - I've just made small tests.
> 
> The difference generated for -fcommon and -fno-common is just the global
> variable order in memory address.
> 
> -fcommon is as following (some special order):
> stderr@GLIBC_2.2.5
> completed.0
> Begin_Time
...
> -fno-common is as following (reversed order of source code):
> stderr@GLIBC_2.2.5
> completed.0
> Dhrystones_Per_Second
> Microseconds
> User_Time
...

A change in the order is not unexpected.  But it is hard to believe this will
make a significant difference to the speed of the code as much as you describe
- it would have to involve particularly unlucky cache issues.

On the x86-64, defined variables appear to be allocated in the reverse order
from the source code unless there are overriding reasons to change that.  I
don't know why that is the case.  You can avoid this by using the
"-fno-toplevel-reorder" switch.  I don't know how common variables are
allocated - that may depend on ordering in the code, or linker scripts, or
declarations in headers.

I have no idea about your program, but one situation where the details of
memory  layout can have a big effect is if you have multiple threads, and
nominally independent data used by multiple threads happen to share a cache
line.  Access patterns to arrays and structs can also have different effects
depending on the alignment of the data to cache lines.

So you might try "-fno-toplevel-reorder" to have tighter control of the
ordering.  It may also be worth adding cacheline-sized _Alignas specifiers to
some objects, particularly bigger or critical structs or arrays.  (If you are
using a C standard prior to C11, gcc's __attribute__((aligned(XXX))) can be
used.)

Reply via email to