On Wed, Nov 22, 2017 at 06:28:24PM +0000, Jeff Law wrote: > On 11/21/2017 04:57 AM, James Greenhalgh wrote: > > I've finally built up enough courage to start getting my head around this... > Can't blame you for avoiding :-) This stuff isn't my idea of fun either.
Right, here's where I'm up to... I think that I have a reasonable grasp of the overall intentions for this patch and what we're trying to do. We want to put out regular probes (writes of zero) to the stack, at a regular interval (given by PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL ), starting at a known offset from the stack base ( PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE ). There's some subtetly around when and where we need probes, what the efficient sequences to achieve that would be, etc. and handling that complexity, and the trouble of having two possible stack adjustments to protect (initial adjustment and final adjustment), is where most of the body of the patch comes from. So, as far as I can read there are a couple of design points that I need clarified. If we get through these it might be possible for Wilco to pick up the patch and optimise the cases he's marked as a concern earlier. Let me try to expand on these... One point of contention in the review so far has been whether the default values for PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL and PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE are correct for AArch64. As this has some implications for the algorithm we would like to use for probing, where exactly we'd like to emit the probes, and might need a glibc patch it will be helpful to narrow this down to an answer. I see two positions. Wilco would like both parameters set to 16 - that means (ignoring the caller-protected stuff for now) a guard page size of 64k, and a probe every 64k. Jeff's current patch has a 64k guard page size, and a probe every 4k. What concerns me is this from Jeff from couple of emails back: > Given the ISA I wouldn't expect an interval > 12 to be useful or necessarily > even work correctly. There's a clear difference of opinion here! I'd guess the difference of opinion is whether you need to probe every page (4k) or whether you need to probe on each multiple of the minimum guard page size (64k), but maybe it is just a difficulty in juggling registers? For Wilco's position to work, we'd need to be happy emitting a probe once for every 64k - from the sounds of the other comments on this thread, there's no problem with that - we could have a minimum size of 64k for PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL (though the current code would probably need to change to accomodate that). Clearly we can reduce the number of probes required to 1/16th if this were feasible. So, first design question - other than potential troubles grabbing scratch registers (we'll revisit this), is there a reason that would stop us deciding to default to 64k for both parameters on AArch64? Where this gets messy is handling the stack write. The current design will drop the stack pointer, then emit a probe back near the top where we were previosuly sitting. i.e. sp = sp - 4096 *(sp + 4088) = 0 For larger probe intervals, we end up juggling bigger numbers, which makes the calculation slightly more unwiedly. One suggestion from Wilco would be to change this sequence to: sp = sp - 4096 *(sp) = 0 In other words, drop the stack and then emit a probe where we are now sitting. That would save us juggling the large number for the load offset. Is there an obvious flaw I've missed in probing at the bottom of the range rather than the top? It seems to me that as long as we were to ensure we managed to emit the first probe (the one at the bottom of GUARD_SIZE) correctly, we could probably get away with this - but I've missed some of the early discussions on the topic. Maybe the reasoning for this comes from the optimisation for sharing responsibility between caller and callee - which I'll admit to not fully understanding yet? I think my understanding of the ideal code sequence from Wilco would be (modulo corner cases): /* Initial probe, assuming that our caller was well behaved, we need to probe 64k below where they were last obliged to probe. First drop the stack pointer to the right place, then probe as required. */ sp = sp - (GUARD_PAGE_SIZE - caller_saved_1k); *(sp) = 0; /* Subsequent probes, one every probe interval. */ sp = sp - PROBE_INTERVAL; *(sp) = 0; sp = sp - PROBE_INTERVAL; *(sp) = 0; sp = sp - PROBE_INTERVAL; *(sp) = 0; sp = sp - residual *(sp) = 0; Is this good enough to provide the level of protection we need? I think if I can understand these two design questions I can make a better job of the review here. I still don't want to push for any optimisation for AArch64 (we can hack that in ourselves :-) ) just to understand where we can be flexible, and where we'd be shooting ourselves in the foot if we screwed up. Thanks for the earlier replies, they've been helpful. Thanks, James