PR96191 [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96191] was a bug raised against the aarch64 implementation of -fstack-protector. As it happens, the same bug affected arm, but AFAICT those are the only two affected targets.
-fstack-protector works by: * creating a special canary slot in the stack frame, * initialising it with a special canary value, and then * checking at the end of the function whether the canary slot still has the correct value. If the slot has changed value, the function calls a special stack-smash handler that would typically abort the program. On many targets, the code that sets up and tests the canary slot will need to load the canary value into registers. However, GCC tries to guarantee that this value does not remain in registers beyond the “test” and “set” operations. For example, the documentation of the stack_protect_test pattern says: This pattern, if defined, compares a @code{ptr_mode} value from the valid memory location in operand 1 with the memory in operand 0 without leaving the value in a register afterward and branches to operand 2 if the values were equal. If this pattern is not defined, then a plain compare pattern and conditional branch pattern is used. The bug in the PR was that aarch64 (and arm) did this when setting up the canary slot, but not when testing it. However, it's not obvious (to me) whether this is really necessary and what it's really protecting against. Even if we zero out the registers after these patterns, the canary value is still readily available by other means: (1) We don't make any effort to hide the address of the canary value (typically &__stack_chk_guard, although some targets support alternatives). It's not obvious what “hiding” this address would actually mean in practice, since it would often be easily predictable from other non-secret addresses. (2) The canary value is often available in stack locations, such as: (a) a canary in the current function, if the current function uses stack-smash protection (b) a “protected” caller further up the call stack (c) canary values left around by previous calls, if the canary value happens to occupy “dead space” in the current frame And being on the stack is of course fundamental to the whole scheme. It's also not clear where the requirement came from. It didn't seem to be in IBM's original implementation: https://gcc.gnu.org/pipermail/gcc-patches/2004-September/148566.html but was part of rth's initial reimplementation: https://gcc.gnu.org/pipermail/gcc-patches/2005-May/170338.html LLVM has taken the deliberate decision *not* to zero out the registers for set and test patterns, see: https://bugs.llvm.org/show_bug.cgi?id=46994 So if this zeroing isn't offering any significant protection, there's a danger that GCC is being uncompetitive on code size for very little gain. On the other side, I guess there's the argument that this is typically “only” 2 extra instructions per function. I think those instructions still have to pay their way though. E.g. for two more instructions we could clear the address register, and for one more instruction we could zero out the stack slot after testing it. It isn't obvious to me why the two extra instructions we currently have are worth it but those five aren't. Any thoughts? Is there a specific reason for the current trade-off? Is it something we should revisit? Thanks, Richard