On Wed, Apr 27, 2016 at 10:24:21PM -0600, Jeff Law wrote: > On 04/27/2016 02:20 AM, Dominik Vogt wrote: > > * combine.c (make_compound_operation): Take known zero bits into > > account when checking for possible zero_extend. > I'd strongly recommend writing some tests for this. Extra credit if > they can be run on an x86 target which gets more testing than s390.
I'll look into that later. > If I go back to our original discussion, we have this going into combine: > > (insn 6 3 7 2 (parallel [ > (set (reg:SI 64) > (and:SI (mem:SI (reg/v/f:DI 63 [ a ]) [1 *a_2(D)+0 S4 A32]) > (const_int -65521 [0xffffffffffff000f]))) > (clobber (reg:CC 33 %cc)) > ]) andc-immediate.c:21 1481 {*andsi3_zarch} > (expr_list:REG_DEAD (reg/v/f:DI 63 [ a ]) > (expr_list:REG_UNUSED (reg:CC 33 %cc) > (nil)))) > (insn 7 6 12 2 (set (reg:DI 65) > (zero_extend:DI (reg:SI 64))) andc-immediate.c:21 1207 > {*zero_extendsidi2} > (expr_list:REG_DEAD (reg:SI 64) > (nil))) > (insn 12 7 13 2 (set (reg/i:DI 2 %r2) > (reg:DI 65)) andc-immediate.c:22 1073 {*movdi_64} > (expr_list:REG_DEAD (reg:DI 65) > (nil))) > > Which combine turns into: > > (insn 6 3 7 2 (parallel [ > (set (reg:SI 64) > (and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32]) > (const_int -65521 [0xffffffffffff000f]))) > (clobber (reg:CC 33 %cc)) > ]) andc-immediate.c:21 1481 {*andsi3_zarch} > (expr_list:REG_DEAD (reg:DI 2 %r2 [ a ]) > (expr_list:REG_UNUSED (reg:CC 33 %cc) > (nil)))) > (insn 12 7 13 2 (parallel [ > (set (reg/i:DI 2 %r2) > (and:DI (subreg:DI (reg:SI 64) 0) > ^^^ > (const_int 4294901775 [0xffff000f]))) > ^^^^^^^^^^ > (clobber (reg:CC 33 %cc)) > ]) andc-immediate.c:22 1474 {*anddi3} > (expr_list:REG_UNUSED (reg:CC 33 %cc) > (expr_list:REG_DEAD (reg:SI 64) > (nil)))) > > > Instead you want insn 12 to use a zero-extend to extend (reg:SI 64) > into (reg:DI 2)? Yes, because we get the zero extend for free in this case (through the constant in the AND or because the input value is a function argument that is already zero extended). > Can't you achieve this in this clause: > > /* If the constant is one less than a power of two, this might be > representable by an extraction even if no shift is present. > If it doesn't end up being a ZERO_EXTEND, we will ignore it unless > we are in a COMPARE. */ > > You extract the constant via UINTVAL (XEXP (x, 1)), then munge it > based on nonzero_bits and pass the result to exact_log2? That's what we tried first, but it resulted in worse code in many places (saved about 250 instructions in the SPEC2006 testsuite but added about 42000 elsewhere). It was so bad that I didn't even bother to check what's going on. Probably this was triggered all over the place by small constants like 1, 3, 7 and the like where s390 has no cheap way for zero extension. So I limited this to constants that are actually mode masks, implicitly assuming that there are zero extend instructions only for known modes (which is true for s390 but may not for some other targets). Being conservative here shouldn't hurt; but I wonder whether there are targets where this condition still allows too much. > Though I do like how you've conditionalized on the cost of the and > vs the cost of hte zero-extend. So maybe your approach is > ultimately better. Actually we wanted to remove the call to rtx_cost() (because usually combine just assumes that a zero extend is cheaper). I've probably forgotten to remove it before posting the patch. For s390 it's meaningless whether rtx_cost() is called or not because at the moment it doesn't model the cost of zero extension (i.e. the cost of either way is just one instruction, and without context it's not possible to decide whether s390 needs a separate instruction for the zero extend or whether it comes for free). > Still curious your thoughts on doing it by just > munging the constant you pass off to exact_log2 in that earlier > clause. > >+ /* If the one operand is a paradoxical subreg of a register or memory > >and > >+ the constant (limited to the smaller mode) has only zero bits where > >+ the sub expression has known zero bits, this can be expressed as > >+ a zero_extend. */ > >+ else if (GET_CODE (XEXP (x, 0)) == SUBREG) > >+ { > >+ rtx sub; > >+ > >+ sub = XEXP (XEXP (x, 0), 0); > >+ machine_mode sub_mode = GET_MODE (sub); > >+ if ((REG_P (sub) || MEM_P (sub)) > >+ && GET_MODE_PRECISION (sub_mode) < mode_width > >+ && (UINTVAL (XEXP (x, 1)) > >+ | (~nonzero_bits (sub, sub_mode) & GET_MODE_MASK (sub_mode)) > >+ ) == GET_MODE_MASK (sub_mode)) > I'd either factor something out or write a nested conditional to > avoid the horrid line wrapping here and hopefully make the > conditions easier to read. Will do. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany