On Wed, May 5, 2021 at 12:37 PM Christoph Muellner <cmuell...@gcc.gnu.org> wrote:
> The existing CAS implementation uses an INSN definition, which provides > the core LR/SC sequence. Additionally to that, there is a follow-up code, > that evaluates the results and calculates the return values. > This has two drawbacks: a) an extension to sub-word CAS implementations > is not possible (even if, then it would be unmaintainable), and b) the > implementation is hard to maintain/improve. > This patch provides a programmatic implementation of CAS, similar > like many other architectures are having one. > A comment that Andrew Waterman made to me today about the safety of this under various circumstances got me thinking, and I realized that without the special cas pattern we can get reloads in the middle of the sequence which would be bad. Experimenting a bit, I managed to prove it. This is using the old version of the patch which I already had handy, but I'm sure the new version will behave roughly the same way. Using the testsuite testcase atomic-compare-exchange-3.c as before, and adding a lot of -ffixed-X options to simulate high register pressure, with the compiler command ./xgcc -B./ -O2 -S tmp.c -ffixed-x16 -ffixed-x17 -ffixed-x18 -ffixed-x19 -ffixed-x20 -ffixed-x21 -ffixed-x22 -ffixed-x23 -ffixed-x24 -ffixed-x25 -ffixed-x26 -ffixed-x27 -ffixed-x28 -ffixed-x29 -ffixed-x30 -ffixed-x31 -ffixed-x15 -ffixed-x14 -ffixed-x13 -ffixed-x12 -ffixed-s0 -ffixed-s1 -ffixed-t2 -ffixed-t1 -ffixed-t0 I get for the first lr/sc loop .L2: lui a1,%hi(v) addi a0,a1,%lo(v) lr.w a1, 0(a0) ld a0,8(sp) sw a1,24(sp) bne a1,a0,.L39 lui a1,%hi(v) addi a0,a1,%lo(v) lw a1,16(sp) sd ra,24(sp) sc.w ra, a1, 0(a0) sext.w a1,ra ld ra,24(sp) bne a1,zero,.L2 and note all of the misc load/store instructions added by reload. I don't think this is safe or guaranteed to work. With the cas pattern, any reloads are guaranteed to be emitted before and/or after the lr/sc loop. With the separate patterns, there is no way to ensure that we won't get accidental reloads in the middle of the lr/sc loop. I think we need to keep the cas pattern. We can always put C code inside the output template of the cas pattern if that is helpful. It can do any necessary tests and then return an appropriate string for the instructions we want. Jim