On Sat, 12 Dec 2020, Jakub Jelinek via Gcc-patches wrote:

> This patch adds the ~(X - Y) -> ~X + Y simplification requested
> in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
> be safely negated.

 This regresses VAX code produced by the cmpelim-eq-notsi.c test case (and 
its similar counterparts) with the `vax-netbsdelf' target.

 Previously this assembly:

        .text
        .align 1
.globl eq_notsi
        .type   eq_notsi, @function
eq_notsi:
        .word 0 # 35    [c=0]  procedure_entry_mask
        subl2 $4,%sp    # 46    [c=32]  *addsi3
        mcoml 4(%ap),%r0        # 32    [c=16]  *one_cmplsi2_ccz
        jeql .L1                # 34    [c=26]  *branch_ccz
        addl2 $2,%r0    # 31    [c=32]  *addsi3
.L1:
        ret             # 40    [c=0]  return
        .size   eq_notsi, .-eq_notsi

was produced.  Now this:

        .text
        .align 1
.globl eq_notsi
        .type   eq_notsi, @function
eq_notsi:
        .word 0 # 36    [c=0]  procedure_entry_mask
        subl2 $4,%sp    # 48    [c=32]  *addsi3
        movl 4(%ap),%r0 # 33    [c=16]  *movsi_2
        cmpl %r0,$-1    # 34    [c=8]  *cmpsi_ccz/1
        jeql .L3                # 35    [c=26]  *branch_ccz
        subl3 %r0,$1,%r0        # 32    [c=32]  *subsi3/1
        ret             # 27    [c=0]  return
.L3:
        clrl %r0                # 31    [c=2]  *movsi_2
        ret             # 41    [c=0]  return
        .size   eq_notsi, .-eq_notsi

is, which is clearly worse, both in terms of performance and size.

 The key here is that the cost of constant 0, here used with a comparison 
operation eliminated after MCOML in the former assembly sequence, is lower 
(as per `vax_rtx_costs') in the VAX ISA than the cost of constant -1, used 
with CMPL in the latter sequence.  Not only constant 0 is an implied 
operand with some machine instructions saving cycles and space otherwise 
used for an explicitly encoded operand, but if used with a comparison 
operation it can usually be eliminated, so it should be preferred over all 
other constants.

 With the example you gave with the PR I can see progression with f3, f4, 
f7, f8, regression with f1, f2, and no change in operation cost with f5, 
f6.

 Shouldn't a transformation like this respect target-specific expression 
costs somehow then?  Depending on the individual case one form or the 
other might be cheaper, and somehow we assume here both are equivalent in 
terms of performance and/or code size (as applicable for the optimisation 
mode chosen).

  Maciej

Reply via email to