On Wed, Apr 27, 2016 at 10:24:21PM -0600, Jeff Law wrote:
> On 04/27/2016 02:20 AM, Dominik Vogt wrote:
> >     * combine.c (make_compound_operation): Take known zero bits into
> >     account when checking for possible zero_extend.
> I'd strongly recommend writing some tests for this.  Extra credit if
> they can be run on an x86 target which gets more testing than s390.

I'll look into that later.

> If I go back to our original discussion, we have this going into combine:
> 
> (insn 6 3 7 2 (parallel [
>             (set (reg:SI 64)
>                 (and:SI (mem:SI (reg/v/f:DI 63 [ a ]) [1 *a_2(D)+0 S4 A32])
>                     (const_int -65521 [0xffffffffffff000f])))
>             (clobber (reg:CC 33 %cc))
>         ]) andc-immediate.c:21 1481 {*andsi3_zarch}
>      (expr_list:REG_DEAD (reg/v/f:DI 63 [ a ])
>         (expr_list:REG_UNUSED (reg:CC 33 %cc)
>             (nil))))
> (insn 7 6 12 2 (set (reg:DI 65)
>         (zero_extend:DI (reg:SI 64))) andc-immediate.c:21 1207
> {*zero_extendsidi2}
>      (expr_list:REG_DEAD (reg:SI 64)
>         (nil)))
> (insn 12 7 13 2 (set (reg/i:DI 2 %r2)
>         (reg:DI 65)) andc-immediate.c:22 1073 {*movdi_64}
>      (expr_list:REG_DEAD (reg:DI 65)
>         (nil)))
> 
> Which combine turns into:
> 
> (insn 6 3 7 2 (parallel [
>             (set (reg:SI 64)
>                 (and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32])
>                     (const_int -65521 [0xffffffffffff000f])))
>             (clobber (reg:CC 33 %cc))
>         ]) andc-immediate.c:21 1481 {*andsi3_zarch}
>      (expr_list:REG_DEAD (reg:DI 2 %r2 [ a ])
>         (expr_list:REG_UNUSED (reg:CC 33 %cc)
>             (nil))))
> (insn 12 7 13 2 (parallel [
>             (set (reg/i:DI 2 %r2)
>                 (and:DI (subreg:DI (reg:SI 64) 0)
>                  ^^^
>                     (const_int 4294901775 [0xffff000f])))
>                                            ^^^^^^^^^^
>             (clobber (reg:CC 33 %cc))
>         ]) andc-immediate.c:22 1474 {*anddi3}
>      (expr_list:REG_UNUSED (reg:CC 33 %cc)
>         (expr_list:REG_DEAD (reg:SI 64)
>             (nil))))
> 
> 
> Instead you want insn 12 to use a zero-extend to extend (reg:SI 64)
> into (reg:DI 2)?

Yes, because we get the zero extend for free in this case (through
the constant in the AND or because the input value is a function
argument that is already zero extended).

> Can't you achieve this in this clause:
> 
>      /* If the constant is one less than a power of two, this might be
>          representable by an extraction even if no shift is present.
>          If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
>          we are in a COMPARE.  */
> 
> You extract the constant via UINTVAL (XEXP (x, 1)), then munge it
> based on nonzero_bits and pass the result to exact_log2?

That's what we tried first, but it resulted in worse code in many
places (saved about 250 instructions in the SPEC2006 testsuite but
added about 42000 elsewhere).  It was so bad that I didn't even
bother to check what's going on.  Probably this was triggered all
over the place by small constants like 1, 3, 7 and the like where
s390 has no cheap way for zero extension.  So I limited this to
constants that are actually mode masks, implicitly assuming that
there are zero extend instructions only for known modes (which is
true for s390 but may not for some other targets).  Being
conservative here shouldn't hurt; but I wonder whether there are
targets where this condition still allows too much.

> Though I do like how you've conditionalized on the cost of the and
> vs the cost of hte zero-extend.  So maybe your approach is
> ultimately better.

Actually we wanted to remove the call to rtx_cost() (because
usually combine just assumes that a zero extend is cheaper).  I've
probably forgotten to remove it before posting the patch.  For
s390 it's meaningless whether rtx_cost() is called or not because
at the moment it doesn't model the cost of zero extension (i.e.
the cost of either way is just one instruction, and without
context it's not possible to decide whether s390 needs a separate
instruction for the zero extend or whether it comes for free).

> Still curious your thoughts on doing it by just
> munging the constant you pass off to exact_log2 in that earlier
> clause.

> >+      /* If the one operand is a paradoxical subreg of a register or memory 
> >and
> >+     the constant (limited to the smaller mode) has only zero bits where
> >+     the sub expression has known zero bits, this can be expressed as
> >+     a zero_extend.  */
> >+      else if (GET_CODE (XEXP (x, 0)) == SUBREG)
> >+    {
> >+      rtx sub;
> >+
> >+      sub = XEXP (XEXP (x, 0), 0);
> >+      machine_mode sub_mode = GET_MODE (sub);
> >+      if ((REG_P (sub) || MEM_P (sub))
> >+          && GET_MODE_PRECISION (sub_mode) < mode_width
> >+          && (UINTVAL (XEXP (x, 1))
> >+              | (~nonzero_bits (sub, sub_mode) & GET_MODE_MASK (sub_mode))
> >+              ) == GET_MODE_MASK (sub_mode))
> I'd either factor something out or write a nested conditional to
> avoid the horrid line wrapping here and hopefully make the
> conditions easier to read.

Will do.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Reply via email to