Hi Segher,

Thanks for the comments!

on 2021/1/15 上午4:43, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Dec 09, 2020 at 05:49:53PM +0800, Kewen.Lin wrote:
>> This patch is to treat those new pseudo-to-pseudo copies
>> after hard-reg-to-pseudo-copy as zero costs.  The
>> justification is that these new copies are closely after
>> the corresponding hard-reg-to-pseudo-copy insns, register
>> allocation should be able to coalesce them and get them
>> eliminated.
> 
> Costing things that are not free as cost zero is very problematic.

I totally agree this.  But I'd argue that these new pseudo-to-pseudo
copies aren't counted as "not free".  Assuming they are not free, we
need to have one postpass to clean them once they are not used in
combine pass.  These new copies are closely after the corresponding
hard-reg-to-pseudo-copy insns, register allocation should be able to
coalesce them and get them eliminated.  So I think they are free
in the context of combine cost modeling.

> Cost zero is problematic in combine anyway (it means unknown cost, not
> no cost).

Yeah, zeroing cost is a bad wording here, the patch actually treats
them as free in costing, not change their costs to zero (unknown).

> 
>> Now these copies follow the normal costing scheme, the
>> below case dump shows the unexpected combination:
>>
>> ``` dump
>>
>> Trying 3, 2 -> 13:
>>     3: r119:DI=r132:DI
>>       REG_DEAD r132:DI
>>     2: r118:DI=r131:DI
>>       REG_DEAD r131:DI
>>    13: r128:DI=r118:DI&0xffffffff|r119:DI<<0x20
>>       REG_DEAD r119:DI
>>       REG_DEAD r118:DI
> 
> This should not combine if 2+13 and 3+13 do not combine already.  Why
> did those not combine?

```
Trying 2 -> 13:
    2: r118:DI=r131:DI
      REG_DEAD r131:DI
   13: r128:DI=r118:DI&0xffffffff|r119:DI<<0x20
      REG_DEAD r119:DI
      REG_DEAD r118:DI
Failed to match this instruction:
(set (reg:DI 128)
    (ior:DI (ashift:DI (reg/v:DI 119 [ f2 ])
            (const_int 32 [0x20]))
        (reg:DI 131)))

Trying 3 -> 13:
    3: r119:DI=r132:DI
      REG_DEAD r132:DI
   13: r128:DI=r118:DI&0xffffffff|r119:DI<<0x20
      REG_DEAD r119:DI
      REG_DEAD r118:DI
Failed to match this instruction:
(set (reg:DI 128)
    (ior:DI (ashift:DI (reg:DI 132)
            (const_int 32 [0x20]))
        (reg/v:DI 118 [ f1 ])))
```

We don't have the pattern to match it.  So the patch in another thread
"rs6000: Use rldimi for vec init instead of shift + ior" can't keep the
desirable *rotl<mode>3_insert_3 instructions without adjusting the
costs like this patch.  Later 2,3 -> 13 will split it into shift and or.

> 
>> Failed to match this instruction:
>> (set (reg:DI 128)
>>     (ior:DI (ashift:DI (reg:DI 132)
>>             (const_int 32 [0x20]))
>>         (reg:DI 131)))
> 
> Likely because it results in this, and this insn isn't recognised.  So
> this can be fixed by adding a pattern for it (it needs to make sure all
> but the bottom 32 bits of reg 131 are zero; it can use nonzero_bits for
> that).
> 

Yeah, that's what I thought before, as another patch mentioned, I tried
to use nonzero_bits in define_insn but failed to.  I didn't found any
nonzero_bits usages in md files, the regression tesing also showed the
recog would have more rough nonzero_bits information than what we have
in combine and then fail to validate changes, so I did give up this
direction before.

Your patch below shows a brand new way to use nonzero_bits and avoid
the recog issue, it opens a window for me, so great!!!

> Long ago I had the following patch for this.  Not sure why I never
> submitted it, maybe there is something wronmg with it?
> 

If you don't mind, I'll do a check with bootstrappping and regression
testing and then get back to you.
BR,
Kewen

Reply via email to