Re: [RFA][PATCH] Stack clash protection 07/08 -- V4 (aarch64 bits)

James Greenhalgh Mon, 27 Nov 2017 10:45:41 -0800

On Wed, Nov 22, 2017 at 06:28:24PM +0000, Jeff Law wrote:
> On 11/21/2017 04:57 AM, James Greenhalgh wrote:
> > I've finally built up enough courage to start getting my head around this...
> Can't blame you for avoiding :-) This stuff isn't my idea of fun either.


Right, here's where I'm up to... I think that I have a reasonable grasp of
the overall intentions for this patch and what we're trying to do.

We want to put out regular probes (writes of zero) to the stack, at a regular
interval (given by PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL ), starting
at a known offset from the stack
base ( PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE ). There's some subtetly
around when and where we need probes, what the efficient sequences to
achieve that would be, etc. and handling that complexity, and the trouble
of having two possible stack adjustments to protect (initial adjustment and
final adjustment), is where most of the body of the patch comes from.

So, as far as I can read there are a couple of design points that I need
clarified. If we get through these it might be possible for Wilco to pick
up the patch and optimise the cases he's marked as a concern earlier.

Let me try to expand on these...

One point of contention in the review so far has been whether the default
values for PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL and
PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE are correct for AArch64. As this
has some implications for the algorithm we would like to use for probing,
where exactly we'd like to emit the probes, and might need a glibc patch
it will be helpful to narrow this down to an answer.

I see two positions.

Wilco would like both parameters set to 16 - that means (ignoring the
caller-protected stuff for now) a guard page size of 64k, and a probe
every 64k.

Jeff's current patch has a 64k guard page size, and a probe every 4k.

What concerns me is this from Jeff from couple of emails back:

> Given the ISA I wouldn't expect an interval > 12 to be useful or necessarily
> even work correctly.

There's a clear difference of opinion here!

I'd guess the difference of opinion is whether you need to probe every page
(4k) or whether you need to probe on each multiple of the minimum guard page
size (64k), but maybe it is just a difficulty in juggling registers? For
Wilco's position to work, we'd need to be happy emitting a probe once for
every 64k - from the sounds of the other comments on this thread, there's no
problem with that - we could have a minimum size of 64k for
PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL (though the current code would
probably need to change to accomodate that). Clearly we can reduce the
number of probes required to 1/16th if this were feasible.

So, first design question - other than potential troubles grabbing scratch
registers (we'll revisit this), is there a reason that would stop us deciding
to default to 64k for both parameters on AArch64?

Where this gets messy is handling the stack write. The current design 
will drop the stack pointer, then emit a probe back near the top where we
were previosuly sitting. i.e.

  sp = sp - 4096
  *(sp + 4088) = 0

For larger probe intervals, we end up juggling bigger numbers, which makes
the calculation slightly more unwiedly.

One suggestion from Wilco would be to change this sequence to:

  sp = sp - 4096
  *(sp) = 0

In other words, drop the stack and then emit a probe where we are now
sitting. That would save us juggling the large number for the load offset.

Is there an obvious flaw I've missed in probing at the bottom of the range
rather than the top? It seems to me that as long as we were to ensure we
managed to emit the first probe (the one at the bottom of GUARD_SIZE)
correctly, we could probably get away with this - but I've missed some of the
early discussions on the topic.

Maybe the reasoning for this comes from the optimisation for sharing
responsibility between caller and callee - which I'll admit to not fully
understanding yet?

I think my understanding of the ideal code sequence from Wilco would be
(modulo corner cases):

  /* Initial probe, assuming that our caller was well behaved, we need
     to probe 64k below where they were last obliged to probe.  First drop
     the stack pointer to the right place, then probe as required.  */
  sp = sp - (GUARD_PAGE_SIZE - caller_saved_1k);
  *(sp) = 0;
  /* Subsequent probes, one every probe interval.  */
  sp = sp - PROBE_INTERVAL;
  *(sp) = 0;
  sp = sp - PROBE_INTERVAL;
  *(sp) = 0;
  sp = sp - PROBE_INTERVAL;
  *(sp) = 0;
  sp = sp - residual
  *(sp) = 0;

Is this good enough to provide the level of protection we need?

I think if I can understand these two design questions I can make a better
job of the review here. I still don't want to push for any optimisation
for AArch64 (we can hack that in ourselves :-) ) just to understand where
we can be flexible, and where we'd be shooting ourselves in the foot if
we screwed up. Thanks for the earlier replies, they've been helpful.

Thanks,
James

Re: [RFA][PATCH] Stack clash protection 07/08 -- V4 (aarch64 bits)

Reply via email to