Jeff Law wrote:
> aarch64 is the first target that does not have any implicit probes in
> the caller.  Thus at prologue entry it must make conservative
> assumptions about the offset of the most recent probed address relative
> to the stack pointer.

No - like I mentioned before that's not correct nor acceptable as it would imply
that ~70% of functions need a probe at entry. I did a quick run across SPEC and
found the outgoing argument size is > 1024 in just 9 functions out of 147000!
Those 9 were odd special cases due to auto generated code to interface between
C and Fortran. This is extremely unlikely to occur anywhere else. So even 
assuming
an unchecked caller, large outgoing arguments are simply not a realistic threat.

Therefore even when using a tiny 4K probe size we can safely adjust SP by 3KB
before needing an explicit probe - now only 0.6% of functions need a probe.
If we choose a proper minimum probe distance, say 64KB, explicit probes are
basically non-existent (just 35 functions, or ~0.02% of all functions are > 
64KB).
Clearly inserting probes can be the default as the impact on code quality is 
negligible.

With respect to implementation it is relatively easy to decide in 
aarch64_layout_frame
which frames need probes and where. I don't think keeping a running offset of 
the last
probe/store is useful, it'll just lead to inefficiencies and bugs. The patch 
doesn't deal
with the delayed stores due to shrinkwrapping for example. Inserting probes 
before
the prolog would be easier, eg.

sub tmp, sp, 65536
str xzr, [tmp, 1024]  // allow up to 1KB of outgoing arguments in callee
sub tmp, sp, 131072
str xzr, [tmp, 1024]
... normal prolog for frame size 128-192KB

Wilco

Reply via email to