Re: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08

Jeff Law Wed, 12 Jul 2017 14:09:06 -0700

On 07/12/2017 06:47 AM, Trevor Saunders wrote:
> On Tue, Jul 11, 2017 at 08:02:26PM -0600, Jeff Law wrote:
>> On 07/11/2017 06:20 PM, Wilco Dijkstra wrote:
>>> Jeff Law wrote:
>>>> aarch64 is the first target that does not have any implicit probes in
>>>> the caller.  Thus at prologue entry it must make conservative
>>>> assumptions about the offset of the most recent probed address relative
>>>> to the stack pointer.
>>>
>>> No - like I mentioned before that's not correct nor acceptable as it would 
>>> imply
>>> that ~70% of functions need a probe at entry. I did a quick run across SPEC 
>>> and
>>> found the outgoing argument size is > 1024 in just 9 functions out of 
>>> 147000!
>>> Those 9 were odd special cases due to auto generated code to interface 
>>> between
>>> C and Fortran. This is extremely unlikely to occur anywhere else. So even 
>>> assuming
>>> an unchecked caller, large outgoing arguments are simply not a realistic 
>>> threat.
>> Whether or not such frames exist in SPEC isn't the question.   THere's
>> nothing in the ABI or ISA that allows us to avoid those probes without
>> compromising security.
>>
>> Mixed code scenarios are going to be a fact of life, probably forever
>> unless we can convince every ISV providing software that works on top of
>> RHEL/*BSD/whatever  to turn on probing (based on my experiences, that
>> has exactly a zero chance of occurring).
> 
> On the other hand if probing is fast enough that it can be on by default
> in gcc there should be much less of it.  Even if you did change the ABI
> to require probing it seems unlikely that code violating that
> requirement would hit problems other than this security concern, so I'd
> expect there will be some non compliant asm out there.
Certainly my goal is to enable it by default one day.  Even if that's
ultimately done at the distro level or by a configure time switch.


Certainly if/when we reach that point the amount of unprotected code
will drop dramatically, but based on my experience I fully expect some
folks to turn it off.  They'll say something like "hey, we've audited
our code and we don't have any large stack or allocas, so I'm turning
this thing off to get some un-measurable performance gain."  It'd be an
uber-dumb thing to do, but it's going to happen.

And once that happens, they're open to attack again, even if they were
correct in their audit and their code only calls stack-clash protected
libraries.

Tweaking the ABI to mandate touching *sp in the outgoing args area &
alloca space is better because we likely wouldn't have an option to
avoid that access.   So a well meaning, but clueless, developer couldn't
turn the option off like they could stack checking.

> 
>> In a mixed code scenario a caller may have a large alloca or large
>> outgoing argument area that pushes the stack pointer into the guard page
>> without actually accessing the guard page.  That requires a callee which
>> is compiled with stack checking to make worst case assumptions at
>> function entry to protect itself as much as possible from these attacks.
> 
> It seems to me pretty important to ask how many programs out there have
> a caller that can push the stack into the guard page, but not past it.
I've actually spent a lot of time thing about that precise problem. You
don't need large frames to do that -- you just need controlled heap
leaks and/or controlled recursion.  ie, even if a function has no
allocas and a small frame, it can put the stack pointer into the guard.

THus in the callee we have to make worst case assumptions to be safe on
targets where there is no implicit probe in the caller.



> I'd expect that's not the case for most allocas, either they are safe,
> or can increase the stack arbitrarily.  I'd expect its more likely with
> outgoing arg or large buffer allocations though.
Unfortunately not.  A small alloca is not inherently safe unless you
also touch the allocated space.  That's precisely why the generic code
touches residuals after the loop that handles PROBE_INTERVAL chunks
(consider a small alloca in a loop).

That's also why I suggested a small backwards compatible tweak to the
aarch64 ABI.  Specifically there should be a mandated touch of *sp in
any function where the outgoing args size is greater than some well
defined value or if the function allocates any dynamic stack space.

That mandated touch of *sp at the ABI level would be enough to allow for
a much less conservative assumed state at function start and would allow
the vast majority of functions to need no probing at all.



  I think the largest
> buffer Qualys found was less than 400k? So 1 256k guard page should
> protect 95% of functions, and 1m or 2m  seems like enough to protect
> against all non malicious programs.  I'm not sure, but this is a 64 bit
> arch, so it seems like we should have the adress space for large guard
> pages like that.
I'm all for larger guards, particularly on 64 bit architectures.

We use 64k pages on aarch64 for RHEL which implicitly gives us a minimum
guard of 64k.  That would be a huge factor in any analysis I would do if
the aarch64 maintainers choose not to fully protect their architecture
and I was forced to make a recommendation for Red Hat and its customers.
 I hope I don't have to sit down and do the analysis on this and make
any kind of recommendation.

The fact that Qualys found nothing larger than X (for any X) in the code
they scanned isn't relevant.  There could well be code out there they
did not look at that uses > X or code that is yet to be written that
uses > X.

> 
> There's also the trade off that increasing the amount of code that
> probes reduces the set of code that can either move the stack into or
> past the guard page.
Correct.

> 
>> THe aarch64 maintainers can certain nix what I've done or modify it in
>> ways that suit them.  That is their choice as port maintainers.
>> However, Red Hat will have to evaluate the cost of reducing security for
>> our customer base against the performance improvement of such changes.
>> As I've said before, I do not know where that decision would fall.
>>
>>
>>>
>>> Therefore even when using a tiny 4K probe size we can safely adjust SP by 
>>> 3KB
>>> before needing an explicit probe - now only 0.6% of functions need a probe.
>>> If we choose a proper minimum probe distance, say 64KB, explicit probes are
>>> basically non-existent (just 35 functions, or ~0.02% of all functions are > 
>>> 64KB).
>>> Clearly inserting probes can be the default as the impact on code quality 
>>> is negligible.
>> Again, there's nothing that says 3k is safe.   You're picking an
>> arbitrary point that is safe in a codebase you've looked at.  But I'm
>> looking at this from a "what guarantees do I have from an ABI or ISA
>> standpoint".  The former may be more performant, but it's inherently
>> less secure than the latter.
> 
> On the other hand making -fstack-check=clash the default seems to me
> like a very significant security improvement.
Agreed and it's where I'd like this to go.

Jeff

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08

Reply via email to