On Mon, Mar 15, 2021 at 04:38:52PM +0000, David Laight wrote: > From: Rasmus Villemoes > > Sent: 15 March 2021 16:24 > > On 12/03/2021 03.29, Segher Boessenkool wrote: > > > On Tue, Mar 09, 2021 at 06:19:30AM +0000, Christophe Leroy wrote: > > >> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE, > > >> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5 > > >> generates a call to _restgpr_31_x. > > > > > >> I don't know if there is a way to tell GCC not to emit that call, > > >> because at the end we get more > > instructions than needed. > > > > > > The function is required by the ABI, you need to have it. > > > > > > You get *fewer* insns statically, and that is what -Os is about: reduce > > > the size of the binaries. > > > > Is there any reason to not just always build the vdso with -O2? It's one > > page/one VMA either way, and the vdso is about making certain system > > calls cheaper, so if unconditional -O2 could save a few cycles compared > > to -Os, why not? (And if, as it seems, there's only one user within the > > DSO of _restgpr_31_x, yes, the overall size of the .text segment > > probably increases slightly). > > Sometimes -Os generates such horrid code you really never want to use it. > A classic is on x86 where it replaces 'load register with byte constant' > with 'push byte' 'pop register'. > The code is actually smaller but the execution time is horrid. > > There are also cases where -O2 actually generates smaller code.
Yes, as with all heuristics it doesn't always work out. But usually -Os is smaller. > Although you may need to disable loop unrolling (often dubious at best) > and either force or disable some function inlining. The cases where GCC does loop unrolling at -O2 always help quite a lot. Or, do you have a counter-example? We'd love to see one. And yup, inlining is hard. GCC's heuristics there are very good nowadays, but any single decision has big effects. Doing the important spots manually (always_inline or noinline) has good payoff. Segher