Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

Richard Sandiford Mon, 03 Jun 2024 04:33:30 -0700

Ajit Agarwal <[email protected]> writes:
>> [...]
>> If it is intentional, what distinguishes things like vperm and xxinsertw
>> (and all other unspecs) from plain addition?
>> 
>>   [(set (match_operand:VSX_F 0 "vsx_register_operand" "=wa")
>>         (plus:VSX_F (match_operand:VSX_F 1 "vsx_register_operand" "wa")
>>                  (match_operand:VSX_F 2 "vsx_register_operand" "wa")))]
>>
>
> Plain addition are not supported currently.
> We have not seen many cases with plain addition and this patch
> will not accept plain addition.
>
>  
>> This is why the intention behind the patch is important.  As it stands,
>> it isn't clear what criteria the patch is using to distinguish "valid"
>> fuse candidates from "invalid" ones.
>>
>
> Intention behind this patch all variants of UNSPEC instructions are
> supported and uses without UNSPEC are not supported in this patch.


But why make the distinction this way though?  UNSPEC is a very
GCC-specific concept.  Whether something is an UNSPEC or some other
RTL code depends largely on historical accident.  E.g. we have specific
codes for VEC_SELECT, VEC_MERGE, and VEC_DUPLICATE, but don't have one
for VEC_PERM (even for VEC_PERM_EXPR exists in gimple).

It seems unlikely that GCC's choice about whether to represent something
as an UNSPEC or as another RTL code lines up neatly with the kind of
codegen decisions that a good assembly programmer would make.

I suppose another way of asking is to turn this around and say: what
kind of uses are you trying to exclude?  Presumably things are worse
if you remove this function override.  But what makes them worse?
What kind of uses cause the regression?

>>>>>>> [...]
>>>>>>> +  // Given insn_info pair I1 and I2, return true if offsets are in 
>>>>>>> order.
>>>>>>> +  virtual bool should_handle_unordered_insns (rtl_ssa::insn_info *i1,
>>>>>>> +                                             rtl_ssa::insn_info *i2) = 
>>>>>>> 0;
>>>>>>> +
>>>>>>
>>>>>> This name seems a bit misleading.  The function is used in:
>>>>>>
>>>>>> @@ -2401,6 +2405,9 @@ pair_fusion_bb_info::try_fuse_pair (bool load_p, 
>>>>>> unsigned access_size,
>>>>>>        reversed = true;
>>>>>>      }
>>>>>>  
>>>>>> +  if (!m_pass->should_handle_unordered_insns (i1, i2))
>>>>>> +    return false;
>>>>>> +
>>>>>>    rtx cand_mems[2];
>>>>>>    rtx reg_ops[2];
>>>>>>    rtx pats[2];
>>>>>>
>>>>>> and so it acts as a general opt-out.  The insns aren't known to be 
>>>>>> unordered.
>>>>>>
>>>>>> It looks like the rs6000 override requires the original insns to be
>>>>>> in offset order.  Could you say why that's necessary?  (Both in email
>>>>>> and as a comment in the code.)
>>>>>>
>>>>>
>>>>> Yes rs6000 requires the original load insns to be in offset order.
>>>>> Some regression tests like vect-outer-4f fails if we do load pair
>>>>> fusion with load offsets are not in offset order as this breaks lxvp 
>>>>> semantics.
>>>>
>>>> How does it break the semantics though?  In principle, the generic code
>>>> only fuses if it has "proved" that the loads can happen in either order.
>>>> So it shouldn't matter which order the hardware does things in.
>>>>
>>>> Could you give an example of the kind of situation that you want
>>>> to avoid, and why it generates the wrong result?
>>>>
>>>
>>> (insn 31 62 32 2 (set (reg:V16QI 177 [ MEM <vector(8) short unsigned int> 
>>> [(short unsigned int *)vectp.62_36 + 64B] ])
>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>                 (const_int 64 [0x40])) [1 MEM <vector(8) short unsigned 
>>> int> [(short unsigned int *)vectp.62_36 + 64B]+0 S16 A16]))  
>>> {vsx_movv16qi_64bit}
>>>      (nil))
>>> (insn 32 31 16 2 (set (reg:V16QI 178 [ MEM <vector(8) short unsigned int> 
>>> [(short unsigned int *)vectp.62_36 + 80B] ])
>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>                 (const_int 80 [0x50])) [1 MEM <vector(8) short unsigned 
>>> int> [(short unsigned int *)vectp.62_36 + 80B]+0 S16 A16]))  
>>> {vsx_movv16qi_64bit}
>>>      (nil))
>>> (insn 16 32 21 2 (set (reg:V16QI 159 [ MEM <vector(8) short unsigned int> 
>>> [(short unsigned int *)vectp.62_36 + 16B] ])
>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>                 (const_int 16 [0x10])) [1 MEM <vector(8) short unsigned 
>>> int> [(short unsigned int *)vectp.62_36 + 16B]+0 S16 A16]))  
>>> {vsx_movv16qi_64bit}
>>>      (nil))
>>> (insn 21 16 22 2 (set (reg:V16QI 165 [ MEM <vector(8) short unsigned int> 
>>> [(short unsigned int *)vectp.62_36 + 32B] ])
>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>                 (const_int 32 [0x20])) [1 MEM <vector(8) short unsigned 
>>> int> [(short unsigned int *)vectp.62_36 + 32B]+0 S16 A16])) 
>>> {vsx_movv16qi_64bit}
>>>      (nil))
>>> (insn 22 21 37 2 (set (reg:V16QI 166 [ MEM <vector(8) short unsigned int> 
>>> [(short unsigned int *)vectp.62_36 + 48B] ])
>>>         (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>>                 (const_int 48 [0x30])) [1 MEM <vector(8) short unsigned 
>>> int> [(short unsigned int *)vectp.62_36 + 48B]+0 S16 A16])) 
>>> {vsx_movv16qi_64bit}
>>>      (nil))
>>>
>>> insn 22 and insn 31 is merged in the failure case and breaks the code.
>> 
>> What specifically goes wrong though?  This is just a sequence of loads
>> from the same base pointer, with no interdependencies, so it should be
>> possible to do the loads in any order.
>>
>
> Here in fuse_pair we set first and second based  as follows:
>
>   insn_info *first = (*i1 < *i2) ? i1 : i2;
>   insn_info *second = (first == i1) ? i2 : i1;
>
> This makes higher offset with first and lower offset with second.
> if (*i1 > *i2).
>
> and in set_multiword_subreg interface we pass first and second.
> Hence above code breaks because subreg offsets with 256 bits are not set 
> properly.
>
> If we pass i1 and i2 in set_multiword_subreg (i1, i2, load_p)
> in fuse_pair should_handle_unordered_insns is not required in try_fuse_pair.
>
> I will send the patch by removing the above interface check 
> in try_fuse_pair and pass i1 and i2 in set_multiword_subreg

Thanks for looking into it.

I think it'd be better to resolve the unspec discussion before
posting another version of the patch though.

Richard

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

Reply via email to