Hi Jeff,

There is an issue with your AArch64 patch, it fails to apply properly and does
so silently using 'patch'. I also noticed some odd control characters in the 
other
patches, but they didn't appear to fail (or at least everything builds).

Anyway with -Ofast -static the overall codesize increase is ~0.7% on SPEC2017,
with several above 1% and one over 3%... So -O2 and dynamic linking should
get you well above 1% on average. This is a bit too much static overhead to
enable by default. The current version ICEs with a 64KB probe size so I can't
see whether that has lower overhead. 

I briefly looked at the generated sequences, the biggest issue is that they
don't work together and thus do not protect against jumping the stack guard.
For example both alloca and outgoing arguments can allocate 4KB without
probing, then call a function which allocates another 3KB on top of that
(ie. max probe distance is 7KB...).

There are also too many probes emitted, for example a function with a 7KB
frame emits 2 explicit probes when it should emit just 1 (with a 4KB probe size
we can adjust the stack by 3KB then probe, then by another 4KB, then save
callee-saves and adjust by up to 1KB for outgoing arguments without inserting
more probes).

I don't understand what last_probe_offset is for, it should always be zero
after saving the callee-saves (though currently it is not set correctly). And it
has to be set to a fixed value to limit the maximum outgoing args when doing
final_adjust.

Finally emitting inline loops generates a large amount of code. Although we
can easily reduce the overhead, especially for alloca, I think using helper
functions seems best.

Wilco

Reply via email to