Re: [RFA][PATCH] Stack clash protection 07/08 -- V4 (aarch64 bits)

Jeff Law Wed, 22 Nov 2017 10:28:41 -0800

On 11/21/2017 04:57 AM, James Greenhalgh wrote:
> I've finally built up enough courage to start getting my head around this...
Can't blame you for avoiding :-) This stuff isn't my idea of fun either.


> 
> I see one outstanding issue sitting on this patch version:
> 
> On Sat, Oct 28, 2017 at 05:08:54AM +0100, Jeff Law wrote:
>> On 10/13/2017 02:26 PM, Wilco Dijkstra wrote:
>>> --param=stack-clash-protection-probe-interval=13
>>> --param=stack-clash-protection-guard-size=12
>>>
>>> So if there is a good reason to continue with 2 separate values, we must
>>> force probe interval <= guard size!
>> The param code really isn't designed to enforce values that are
>> inter-dependent.  It has a min, max & default values.  No more, no less.
>>  If you set up something inconsistent with the params, it's simply not
>> going to work.
>>
>>
>>>
>>> Also on AArch64 --param=stack-clash-protection-probe-interval=16 causes
>>> crashes due to the offsets used in the probes - we don't need large offsets
>>> as we want to probe close to the bottom of the stack.
>> Not a surprise.  While I tried to handle larger intervals, I certainly
>> didn't test them.  Given the ISA I wouldn't expect an interval > 12 to
>> be useful or necessarily even work correctly.
> 
> Understood - weird behaviour with weird params don't concern me.
Likewise.  My thinking was to expose them, folks that care use them to
test the impacts of changing them to find the right balance given the
various architectural and OS issues -- then we remove the params :-)

I don't feel that it's worth any significant time sanity checking the
params.

> 
>>> Functions with a large stack emit like alloca a lot of code, here I used
>>> --param=stack-clash-protection-probe-interval=15:
>>>
>>> int f1(int x)
>>> {
>>>   char arr[128*1024];
>>>   return arr[x];
>>> }
>>>
>>> f1:
>>>     mov     x16, 64512
>>>     sub     sp, sp, x16
>>>     .cfi_def_cfa_offset 64512
>>>     mov     x16, -32768
>>>     add     sp, sp, x16
>>>     .cfi_def_cfa_offset -1024
>>>     str     xzr, [sp, 32760]
>>>     add     sp, sp, x16
>>>     .cfi_def_cfa_offset -66560
>>>     str     xzr, [sp, 32760]
>>>     sub     sp, sp, #1024
>>>     .cfi_def_cfa_offset -65536
>>>     str     xzr, [sp, 1016]
>>>     ldrb    w0, [sp, w0, sxtw]
>>>     .cfi_def_cfa_offset 131072
>>>     add     sp, sp, 131072
>>>     .cfi_def_cfa_offset 0
>>>     ret
>>>
>>> Note the cfa offsets are wrong.
>> Yes.  They definitely look wrong.  There's a clear logic error in
>> setting up the ADJUST_CFA note when the probing interval is larger than
>> 2**12.  That should be easily fixed.  Let me poke at it.
> 
> This one does concern me, how did you get on? Did it respond well to
> prodding?
I've got a fix for the CFA note.  It's a one-liner due to my
misunderstanding of a bit of how some of the CFA stuff works.  It only
affects cases where a --param has adjusted the probing interval to a
point where its too large for a simple arithmetic insn to adjust the stack.



>>>
>>> The cfa entries are OK for this case. There is a mix of positive/negative 
>>> offsets which
>>> makes things confusing. Again there are 3 kinds of adjustments when for 
>>> this size we
>>> only need the loop.
>>>
>>> Reusing the existing gen_probe_stack_range code appears a bad idea since
>>> it ignores the probe interval and just defaults to 4KB. I don't see why it 
>>> should be
>>> any more complex than this:
>>>
>>>     sub     x16, sp, 262144  // only need temporary if > 1MB
>>> .LPSRL0:
>>>     sub     sp, sp, 65536
>>>     str     xzr, [sp, 1024]
>>>     cmp     sp, x16
>>>     b.ne    .LPSRL0
>>>     ldrb    w0, [sp, w0, sxtw]
>>>     add     sp, sp, 262144
>>>     ret
>>>
>>> Probe insertion if final adjustment >= 1024 also generates a lot of 
>>> redundant
>>> code - although this is more a theoretical issue given this is so rare.
>> Again, if ARM wants this optimized, then ARM's engineers are going to
>> have to take the lead here.  I've invested all I can reasonably invest
>> in terms of trying optimize the probing for this target.
> 
> Likewise here - thanks for your work so far, I have no expectation of this
> being fully optimized before I OK it to land.
> 
> Sorry for the big delay getting round to this patch, I hope to get serious
> time to put in to it later this week, and it would be helpful to close out
> the few remaining issues before I do.
That'd be awesome.  It's probably too late for further improvements to
land into Red Hat's RHEL 7.5 compiler.

However, I'm in the process now of backporting the kit to F27/F28 so the
glibc teams can bullet proof glibc there and we can certainly exploit
further refinements in F27/F28.

jeff

Re: [RFA][PATCH] Stack clash protection 07/08 -- V4 (aarch64 bits)

Reply via email to