On Wed, May 5, 2021 at 12:37 PM Christoph Muellner <cmuell...@gcc.gnu.org>
wrote:

> The existing CAS implementation uses an INSN definition, which provides
> the core LR/SC sequence. Additionally to that, there is a follow-up code,
> that evaluates the results and calculates the return values.
> This has two drawbacks: a) an extension to sub-word CAS implementations
> is not possible (even if, then it would be unmaintainable), and b) the
> implementation is hard to maintain/improve.
> This patch provides a programmatic implementation of CAS, similar
> like many other architectures are having one.
>

A comment that Andrew Waterman made to me today about the safety of this
under various circumstances got me thinking, and I realized that without
the special cas pattern we can get reloads in the middle of the sequence
which would be bad.  Experimenting a bit, I managed to prove it.  This is
using the old version of the patch which I already had handy, but I'm sure
the new version will behave roughly the same way.  Using the testsuite
testcase atomic-compare-exchange-3.c as before, and adding a lot of
-ffixed-X options to simulate high register pressure, with the compiler
command
./xgcc -B./ -O2 -S tmp.c -ffixed-x16 -ffixed-x17 -ffixed-x18 -ffixed-x19
-ffixed-x20 -ffixed-x21 -ffixed-x22 -ffixed-x23 -ffixed-x24 -ffixed-x25
-ffixed-x26 -ffixed-x27 -ffixed-x28 -ffixed-x29 -ffixed-x30 -ffixed-x31
-ffixed-x15 -ffixed-x14 -ffixed-x13 -ffixed-x12 -ffixed-s0 -ffixed-s1
-ffixed-t2 -ffixed-t1 -ffixed-t0
I get for the first lr/sc loop
.L2:
        lui     a1,%hi(v)
        addi    a0,a1,%lo(v)
        lr.w a1, 0(a0)
        ld      a0,8(sp)
        sw      a1,24(sp)
        bne     a1,a0,.L39
        lui     a1,%hi(v)
        addi    a0,a1,%lo(v)
        lw      a1,16(sp)
        sd      ra,24(sp)
        sc.w ra, a1, 0(a0)
        sext.w  a1,ra
        ld      ra,24(sp)
        bne     a1,zero,.L2
and note all of the misc load/store instructions added by reload.  I don't
think this is safe or guaranteed to work.  With the cas pattern, any
reloads are guaranteed to be emitted before and/or after the lr/sc loop.
With the separate patterns, there is no way to ensure that we won't get
accidental reloads in the middle of the lr/sc loop.

I think we need to keep the cas pattern.  We can always put C code inside
the output template of the cas pattern if that is helpful.  It can do any
necessary tests and then return an appropriate string for the instructions
we want.

Jim

Reply via email to