On 04/27/2016 02:20 AM, Dominik Vogt wrote:
The attached patch is a result of discussing an S/390 issue with
"and with complement" in some cases.
https://gcc.gnu.org/ml/gcc/2016-03/msg00163.html
https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01586.html
Combine would merge a ZERO_EXTEND and a SET taking the known zero
bits into account, resulting in an AND. Later on,
make_compound_operation() fails to replace that with a ZERO_EXTEND
which we get for free on S/390 but leaves the AND, eventually
resulting in two consecutive AND instructions.
The current code in make_compound_operation() that detects
opportunities for ZERO_EXTEND does not work here because it does
not take the known zero bits into account:
/* If the constant is one less than a power of two, this might be
representable by an extraction even if no shift is present.
If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
we are in a COMPARE. */
else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0)
new_rtx = make_extraction (mode,
make_compound_operation (XEXP (x, 0),
next_code),
0, NULL_RTX, i, 1, 0, in_code == COMPARE);
An attempt to use the zero bits in the above conditions resulted
in many situations that generated worse code, so the patch tries
to fix this in a more conservative way. While the effect is
completely positive on S/390, this will very likely have
unforeseeable consequences on other targets.
Bootstrapped and regression tested on s390 and s390x only at the
moment.
Ciao
Dominik ^_^ ^_^
-- Dominik Vogt IBM Germany
0001-ChangeLog
gcc/ChangeLog
* combine.c (make_compound_operation): Take known zero bits into
account when checking for possible zero_extend.
I'd strongly recommend writing some tests for this. Extra credit if
they can be run on an x86 target which gets more testing than s390.
If I go back to our original discussion, we have this going into combine:
(insn 6 3 7 2 (parallel [
(set (reg:SI 64)
(and:SI (mem:SI (reg/v/f:DI 63 [ a ]) [1 *a_2(D)+0 S4 A32])
(const_int -65521 [0xffffffffffff000f])))
(clobber (reg:CC 33 %cc))
]) andc-immediate.c:21 1481 {*andsi3_zarch}
(expr_list:REG_DEAD (reg/v/f:DI 63 [ a ])
(expr_list:REG_UNUSED (reg:CC 33 %cc)
(nil))))
(insn 7 6 12 2 (set (reg:DI 65)
(zero_extend:DI (reg:SI 64))) andc-immediate.c:21 1207
{*zero_extendsidi2}
(expr_list:REG_DEAD (reg:SI 64)
(nil)))
(insn 12 7 13 2 (set (reg/i:DI 2 %r2)
(reg:DI 65)) andc-immediate.c:22 1073 {*movdi_64}
(expr_list:REG_DEAD (reg:DI 65)
(nil)))
Which combine turns into:
(insn 6 3 7 2 (parallel [
(set (reg:SI 64)
(and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32])
(const_int -65521 [0xffffffffffff000f])))
(clobber (reg:CC 33 %cc))
]) andc-immediate.c:21 1481 {*andsi3_zarch}
(expr_list:REG_DEAD (reg:DI 2 %r2 [ a ])
(expr_list:REG_UNUSED (reg:CC 33 %cc)
(nil))))
(insn 12 7 13 2 (parallel [
(set (reg/i:DI 2 %r2)
(and:DI (subreg:DI (reg:SI 64) 0)
^^^
(const_int 4294901775 [0xffff000f])))
^^^^^^^^^^
(clobber (reg:CC 33 %cc))
]) andc-immediate.c:22 1474 {*anddi3}
(expr_list:REG_UNUSED (reg:CC 33 %cc)
(expr_list:REG_DEAD (reg:SI 64)
(nil))))
Instead you want insn 12 to use a zero-extend to extend (reg:SI 64) into
(reg:DI 2)?
Can't you achieve this in this clause:
/* If the constant is one less than a power of two, this might be
representable by an extraction even if no shift is present.
If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
we are in a COMPARE. */
You extract the constant via UINTVAL (XEXP (x, 1)), then munge it based
on nonzero_bits and pass the result to exact_log2?
Though I do like how you've conditionalized on the cost of the and vs
the cost of hte zero-extend. So maybe your approach is ultimately
better. Still curious your thoughts on doing it by just munging the
constant you pass off to exact_log2 in that earlier clause.
+ /* If the one operand is a paradoxical subreg of a register or memory and
+ the constant (limited to the smaller mode) has only zero bits where
+ the sub expression has known zero bits, this can be expressed as
+ a zero_extend. */
+ else if (GET_CODE (XEXP (x, 0)) == SUBREG)
+ {
+ rtx sub;
+
+ sub = XEXP (XEXP (x, 0), 0);
+ machine_mode sub_mode = GET_MODE (sub);
+ if ((REG_P (sub) || MEM_P (sub))
+ && GET_MODE_PRECISION (sub_mode) < mode_width
+ && (UINTVAL (XEXP (x, 1))
+ | (~nonzero_bits (sub, sub_mode) & GET_MODE_MASK (sub_mode))
+ ) == GET_MODE_MASK (sub_mode))
I'd either factor something out or write a nested conditional to avoid
the horrid line wrapping here and hopefully make the conditions easier
to read.
Jeff