https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114532
--- Comment #11 from David Brown <david at westcontrol dot com> --- (In reply to Zhaohaifeng from comment #8) > (In reply to David Brown from comment #7) > > (In reply to Xi Ruoyao from comment #6) > > Anyway, I cannot see any reason while -fno-common should result in the > > slower run-times the OP saw (though I have only looked at current gcc > > versions). I haven't seen any differences in the code generated for > > -fcommon and -fno-common on the x86-64. And my experience on other targets > > is that -fcommon allows optimisations that cannot be done with -fno-common, > > thus giving faster code. > > > > I have not, however, seen the OP's real code - I've just made small tests. > > The difference generated for -fcommon and -fno-common is just the global > variable order in memory address. > > -fcommon is as following (some special order): > stderr@GLIBC_2.2.5 > completed.0 > Begin_Time ... > -fno-common is as following (reversed order of source code): > stderr@GLIBC_2.2.5 > completed.0 > Dhrystones_Per_Second > Microseconds > User_Time ... A change in the order is not unexpected. But it is hard to believe this will make a significant difference to the speed of the code as much as you describe - it would have to involve particularly unlucky cache issues. On the x86-64, defined variables appear to be allocated in the reverse order from the source code unless there are overriding reasons to change that. I don't know why that is the case. You can avoid this by using the "-fno-toplevel-reorder" switch. I don't know how common variables are allocated - that may depend on ordering in the code, or linker scripts, or declarations in headers. I have no idea about your program, but one situation where the details of memory layout can have a big effect is if you have multiple threads, and nominally independent data used by multiple threads happen to share a cache line. Access patterns to arrays and structs can also have different effects depending on the alignment of the data to cache lines. So you might try "-fno-toplevel-reorder" to have tighter control of the ordering. It may also be worth adding cacheline-sized _Alignas specifiers to some objects, particularly bigger or critical structs or arrays. (If you are using a C standard prior to C11, gcc's __attribute__((aligned(XXX))) can be used.)