On 2/10/22 01:55, Richard Sandiford wrote:

There might be a little difference:

- Using push candidates means that a register to be ignored in pop
candidates will not be emitted again during the "restore" (pop_candidates
should always be a subset of push_candidates, since popping a register
without a push might not make sense).

The push candidates are simply a subset of the saved registers though.
Similarly, the pop candidates are simply a subset of the restored registers.
So I think the requirement operates at that level: the restored registers
must be a subset of the saved registers.

In other circumstances it could have been the other way around:
there might have been a change that stopped us from saving two
registers during the allocation, but we wanted to carry on restoring
two registers during the deallocation.  I don't think there's a
reason that the push candidates *have* to be a superset of the
pop candidates (even though they are with the current change).


Oh yeah, that sounds more reasonable.

When we use "pop_candidate[12]", one more insn is emitted:

0000000000400604 <main>:
     400604:       a9bf53f3        stp     x19, x20, [sp, #-16]!
     400608:       52800000        mov     w0, #0x0
+  40060c:       f94007f4        ldr     x20, [sp, #8]
     400610:       f84107f3        ldr     x19, [sp], #16
     400614:       d65f03c0        ret

But in the case of ignoring a specific register (like scs ignores x30),
there is no difference between the two (because we always need
to explicitly specify which registers to ignore in the parameter of
aarch64_restore_callee_saves).

I think this is the correct behaviour.  If we don't want to restore
a register at all then it should be excluded from the restore list
somehow.  In your case you're doing that be using a limit of
X29_REGNUM instead of X30_REGNUM.


Got it, I'll use pop candidates in the next version.

FWIW, I did wonder whether aarch64_restore_callee_saves should be
doing the scs pop, rather than aarch64_expand_epilogue, and in an
earlier draft of the previous review I'd asked for that.  It does
seem conceptually cleaner, but in practice, it would probably have
been awkward to implement.  E.g. we'd need to explicitly stop an
LDP being formed with X30 as the second register.


Well, then I think I should keep it the same here :).

But treating scs push and scs pop as part of the register save and
restore sequences would have one advantage: it would allow the
scs push and scs pop to be shrink-wrapped.


Sorry for my limited knowledge of shrink warping, I don't think I get
it here (I tried to find a case when compiling the kernel and some
gcc test cases but I still don't have a clue.).

I see that the bitmap of LR_REGNUM is cleared in
aarch64_get_separate_components and scs push/pop are x18 based operations.

If we handle them in aarch64_restore/save_callee_saves,
could scs push/pop be shrink-wrapped in some cases?

Thanks,
Dan

Reply via email to