PR96191 [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96191]
was a bug raised against the aarch64 implementation of -fstack-protector.
As it happens, the same bug affected arm, but AFAICT those are the only
two affected targets.

-fstack-protector works by:

* creating a special canary slot in the stack frame,
* initialising it with a special canary value, and then
* checking at the end of the function whether the canary slot still has
  the correct value.

If the slot has changed value, the function calls a special stack-smash
handler that would typically abort the program.

On many targets, the code that sets up and tests the canary slot will
need to load the canary value into registers.  However, GCC tries
to guarantee that this value does not remain in registers beyond the
“test” and “set” operations.  For example, the documentation of the
stack_protect_test pattern says:

  This pattern, if defined, compares a @code{ptr_mode} value from the
  valid memory location in operand 1 with the memory in operand 0 without
  leaving the value in a register afterward and branches to operand 2 if
  the values were equal.

  If this pattern is not defined, then a plain compare pattern and
  conditional branch pattern is used.

The bug in the PR was that aarch64 (and arm) did this when setting up
the canary slot, but not when testing it.

However, it's not obvious (to me) whether this is really necessary and
what it's really protecting against.  Even if we zero out the registers
after these patterns, the canary value is still readily available by
other means:

(1) We don't make any effort to hide the address of the canary value
    (typically &__stack_chk_guard, although some targets support
    alternatives).  It's not obvious what “hiding” this address
    would actually mean in practice, since it would often be easily
    predictable from other non-secret addresses.

(2) The canary value is often available in stack locations, such as:

    (a) a canary in the current function, if the current function
        uses stack-smash protection

    (b) a “protected” caller further up the call stack

    (c) canary values left around by previous calls, if the canary value
        happens to occupy “dead space” in the current frame

    And being on the stack is of course fundamental to the whole scheme.

It's also not clear where the requirement came from.  It didn't seem to
be in IBM's original implementation:

  https://gcc.gnu.org/pipermail/gcc-patches/2004-September/148566.html

but was part of rth's initial reimplementation:

  https://gcc.gnu.org/pipermail/gcc-patches/2005-May/170338.html

LLVM has taken the deliberate decision *not* to zero out the registers
for set and test patterns, see:

  https://bugs.llvm.org/show_bug.cgi?id=46994

So if this zeroing isn't offering any significant protection, there's a
danger that GCC is being uncompetitive on code size for very little gain.

On the other side, I guess there's the argument that this is typically
“only” 2 extra instructions per function.  I think those instructions
still have to pay their way though.  E.g. for two more instructions we
could clear the address register, and for one more instruction we could
zero out the stack slot after testing it.  It isn't obvious to me why
the two extra instructions we currently have are worth it but those
five aren't.

Any thoughts?  Is there a specific reason for the current trade-off?
Is it something we should revisit?

Thanks,
Richard

Reply via email to