On 08/06/2012 11:34 AM, Ulrich Weigand wrote: > There is one particular inefficiency I have noticed. This code: > > if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0)) > abort (); > > from atomic-compare-exchange-3.c gets compiled into: > > l %r3,0(%r2) > larl %r1,v > cs %r3,%r4,0(%r1) > ipm %r1 > sra %r1,28 > st %r3,0(%r2) > ltr %r1,%r1 > jne .L3 > > which is extremely inefficient; it converts the condition code into > an integer using the slow ipm, sra sequence, just so that it can > convert the integer back into a condition code via ltr and branch > on it ...
This was caused (or perhaps abetted by) the representation of EQ as NE ^ 1. With the subsequent truncation and zero-extend, I think combine reached its insn limit of 3 before seeing everything it needed to see. I'm able to fix this problem by representing EQ as EQ before reload. For extimm targets this results in identical code; for older targets it requires avoidance of the constant pool, i.e. LHI+XR instead of X. l %r2,0(%r3) larl %r1,v cs %r2,%r5,0(%r1) st %r2,0(%r3) jne .L3 That fixed, we see the second CAS in that file: .loc 1 27 0 cs %r2,%r2,0(%r1) ipm %r5 sll %r5,28 lhi %r0,1 xr %r5,%r0 st %r2,0(%r3) ltr %r5,%r5 je .L20 This happens because CSE notices the cbranch vs 0, and sets r116 to zero along the path to 32 if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE)) at which point CSE decides that it would be cheaper to "re-use" the zero already in r116 instead of load another constant 0 here. After that, combine is ham-strung because r116 is not dead. I'm not quite sure the best way to fix this, since rtx_costs already has all constants cost 0. CSE ought not believe that r116 is better than a plain constant. CSE also shouldn't be extending the life of pseudos this way. A short-term possibility is to have the CAS insns accept general_operand, so that the 0 gets merged. With reload inheritance and post-reload cse, that might produce code that is "good enough". Certainly it's effective for the atomic-compare-exchange-3.c testcase. I'm less than happy with that, since the non-optimization of CAS depends on following code that is totally unrelated. This patch ought to be independent of any other patch so far. r~
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 0e43e51..bed6b79 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -5325,12 +5325,15 @@ (match_operand 3 "const0_operand")])) (clobber (reg:CC CC_REGNUM))])] "" - "emit_insn (gen_sne (operands[0], operands[2])); - if (GET_CODE (operands[1]) == EQ) - emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx)); - DONE;") +{ + if (!TARGET_EXTIMM && GET_CODE (operands[1]) == EQ) + { + emit_insn (gen_seq_neimm (operands[0], operands[2])); + DONE; + } +}) -(define_insn_and_split "sne" +(define_insn_and_split "*sne" [(set (match_operand:SI 0 "register_operand" "=d") (ne:SI (match_operand:CCZ1 1 "register_operand" "0") (const_int 0))) @@ -5342,6 +5345,48 @@ [(set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 28))) (clobber (reg:CC CC_REGNUM))])]) +(define_insn_and_split "*seq" + [(set (match_operand:SI 0 "register_operand" "=d") + (eq:SI (match_operand:CCZ1 1 "register_operand" "0") + (const_int 0))) + (clobber (reg:CC CC_REGNUM))] + "TARGET_EXTIMM" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx op0 = operands[0]; + emit_insn (gen_lshrsi3 (op0, op0, GEN_INT (28))); + emit_insn (gen_xorsi3 (op0, op0, const1_rtx)); + DONE; +}) + +;; ??? Ideally we'd USE a const1_rtx, properly reloaded, but that makes +;; things more difficult for combine (which can only insert clobbers). +;; But perhaps it would be better still to have simply used a branch around +;; constant load instead of beginning with the IPM? +;; +;; What about LOCR for Z196? That's a more general question about cstore +;; being decomposed into movcc... + +(define_insn_and_split "seq_neimm" + [(set (match_operand:SI 0 "register_operand" "=d") + (eq:SI (match_operand:CCZ1 1 "register_operand" "0") + (const_int 0))) + (clobber (match_scratch:SI 2 "=&d")) + (clobber (reg:CC CC_REGNUM))] + "!TARGET_EXTIMM" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx op0 = operands[0]; + rtx op2 = operands[2]; + emit_insn (gen_ashlsi3 (op0, op0, GEN_INT (28))); + emit_move_insn (op2, const1_rtx); + emit_insn (gen_xorsi3 (op0, op0, op2)); + DONE; +}) ;; ;; - Conditional move instructions (introduced with z196)