https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106725
--- Comment #6 from Daniel Thornburgh <dthorn at google dot com> --- I spent a little more time on this, and here's a more concrete reproducer of GCC's current behavior. The setup again has 3 files: main.c, lto.c, and ext.c. lto.c is a simple getter-setter interface wrapping a global int. main.c sets the value using this interface, then makes an __attribute__((leaf)) call to ext.c. This sets the value to 0. This should be legal, since the call doesn't call back to main.c, it calls to lto.c. $ tail -n+1 *.c ==> ext.c <== void set_value(int v); void external_call(void) { set_value(0); } ==> lto.c <== static int value; void set_value(int v) { value = v; } int get_value(void) { return value; } ==> main.c <== #include <stdio.h> void set_value(int v); int get_value(void); __attribute__((leaf)) void external_call(void); int main(void) { set_value(42); external_call(); printf("%d\n", get_value()); } If we compile main.c and lto.c together using the pre-WHOPR module-merging flow, the resulting binary assumes that the external call cannot clobber the value, and it thus prints 42 rather than zero. $ gcc -c -O2 ext.c $ gcc -O2 -flto-partition=none main.o lto.o ext.o $ ./a.out 42 If you instead use WHOPR, it looks like this optimization doesn't trigger: $ gcc -O2 -flto main.o lto.o ext.o $ ./a.out 0 At least in the unpartitioned case, it looks like the optimizer is considering attribute((leaf)) to apply to the whole LTO unit. I'm unsure what WPA's semantics are, since there may be other reasons why this optimization wasn't taken there.