On 2021/6/10 00:24, Segher Boessenkool wrote:
> On Wed, Jun 09, 2021 at 11:20:20AM +0800, Xionghu Luo wrote:
>> On 2021/6/9 04:11, Segher Boessenkool wrote:
>>> On Fri, Jun 04, 2021 at 09:40:58AM +0800, Xionghu Luo wrote:
>>>>>> rejecting combination of insns 6 and 7
>>>>>> original costs 4 + 4 = 8
>>>>>> replacement cost 12
>>>>>
>>>>> So what instructions were these? Why did the store cost 4 but the new
>>>>> one costs 12?
>>>
>>> The *vsx_le_perm_store_<mode> instruction has the *preferred*
>>> alternative with cost 12, while the other alternative has cost 8. Why
>>> is that? That looks like a bug.
>>> (set_attr "length" "12,8")
>>
>> 12 was introduced by Mike's commit c477a6674364(r6-2577), and all the 5
>> vsx_le_perm_store_<mode> are set to 12 for modes VSX_D/VSX_W/V8HI/V16QI
>> /VSX_LE_128, I guess it is split to two rs6000_emit_le_vsx_permute before
>> reload, but 3 rs6000_emit_le_vsx_permute after reload, so the length is
>> 12, then it seems also not reasonable to change it from 12 to 8? And I am
>> not sure when the alternative 1 will be chosen?
>
> This is the instruction *length*, not the cost directly. The length
> has to be correct, not lower than it will turn out to be that is, or on
> some big testcases you will get branches that cannot reach their target,
> and the resulting ICEs.
>
> Alternatives are chosen by register allocation. Before register
> allocation attributes are taken as if alternative 0 is selected (well,
> the first enabled alternative is selected, same thing here).
>
> Which alternative is the expected (or wanted) one? Either put that one
> first, or if it is the longer one, give it an explicit cost.
>
>> ;; The post-reload split requires that we re-permute the source
>> ;; register in case it is still live.
>> (define_split
>> [(set (match_operand:VSX_LE_128 0 "memory_operand")
>> (match_operand:VSX_LE_128 1 "vsx_register_operand"))]
>> "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR
>> && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
>> [(const_int 0)]
>> {
>> rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
>> rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode);
>> rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
>> DONE;
>> })
>
> So it seems like it is only 3 insns in the very unlucky case? Normally
> it will end up as just one simple store?
I am afraid there is not "simple store" for *TImode on P8LE*. There is only
stxvd2x that rotates the element(stvx requires memory to be aligned, not
suitable pattern), so every vsx_le_perm_store_v1ti must be split to 3
instructions for alternative 0, it seems incorrect to force the cost to be 4.
So you want an explicit cost
> here then:
>
> ;; What is the insn_cost for this insn? The target hook can still override
> ;; this. For optimizing for size the "length" attribute is used instead.
> (define_attr "cost" "" (const_int 0))
>
> So you would use something like
>
> (set_attr "cost" "4,*")
>
> here (if I got that right, please check :-) )
>
> HtH,
>
>
> Segher
>
--
Thanks,
Xionghu