Nikhil Benesch noticed that changes in the GCC backend were making the use of defer functions that call recover less efficient. A defer thunk is a generated function that looks like this (this is the entire function body):
if !runtime.setdeferretaddr(&L) { deferredFunction() } L: The idea is that the address of the label passed to setdeferretaddr is the address to which deferredFunction returns. The code in canrecover compares the return address of the function to this saved address to see whether the recover function can return non-nil. This is explained in marginally more detail at https://www.airs.com/blog/archives/376 . When the return address does not match, the canrecover code does a more costly check that requires unwinding the stack. What Nikhil Benesch noticed is that we were always taking that fallback. It turned out that the label address passed to setdeferretaddr was not the label to which the deferred function would return. And that was because the epilogue was being duplicated by the bb-reorder pass, and the label was moved to one copy of the epilogue while the deferred function returned to the other epilogue. Of course there is no reason to duplicate the epilogue in such a small function. One easy way to disable that epilogue duplication is to compile the function with -Os. That is what this patch does. This patch compiles all thunks, not just defer thunks, with -Os, but since they are all small that does no harm. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian 2019-02-13 Ian Lance Taylor <i...@golang.org> * go-gcc.cc: #include "opts.h". (Gcc_backend::function): Compile thunks with -Os.
Index: go-gcc.cc =================================================================== --- go-gcc.cc (revision 268860) +++ go-gcc.cc (working copy) @@ -25,6 +25,7 @@ #include <gmp.h> #include "tree.h" +#include "opts.h" #include "fold-const.h" #include "stringpool.h" #include "stor-layout.h" @@ -3103,6 +3104,41 @@ Gcc_backend::function(Btype* fntype, con DECL_DECLARED_INLINE_P(decl) = 1; } + // Optimize thunk functions for size. A thunk created for a defer + // statement that may call recover looks like: + // if runtime.setdeferretaddr(L1) { + // goto L1 + // } + // realfn() + // L1: + // The idea is that L1 should be the address to which realfn + // returns. This only works if this little function is not over + // optimized. At some point GCC started duplicating the epilogue in + // the basic-block reordering pass, breaking this assumption. + // Optimizing the function for size avoids duplicating the epilogue. + // This optimization shouldn't matter for any thunk since all thunks + // are small. + size_t pos = name.find("..thunk"); + if (pos != std::string::npos) + { + for (pos += 7; pos < name.length(); ++pos) + { + if (name[pos] < '0' || name[pos] > '9') + break; + } + if (pos == name.length()) + { + struct cl_optimization cur_opts; + cl_optimization_save(&cur_opts, &global_options); + global_options.x_optimize_size = 1; + global_options.x_optimize_fast = 0; + global_options.x_optimize_debug = 0; + DECL_FUNCTION_SPECIFIC_OPTIMIZATION(decl) = + build_optimization_node(&global_options); + cl_optimization_restore(&global_options, &cur_opts); + } + } + go_preserve_from_gc(decl); return new Bfunction(decl); }