Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Ajit Agarwal Thu, 06 Jun 2024 14:17:25 -0700

Hello Richard:

On 06/06/24 8:03 pm, Richard Sandiford wrote:
> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>> On 06/06/24 2:28 pm, Richard Sandiford wrote:
>>> Hi,
>>>
>>> Just some comments on the fuseable_load_p part, since that's what
>>> we were discussing last time.
>>>
>>> It looks like this now relies on:
>>>
>>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>>> +      /* We use DF data flow because we change location rtx
>>>> +   which is easier to find and modify.
>>>> +   We use mix of rtl-ssa def-use and DF data flow
>>>> +   where it is easier.  */
>>>> +      df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
>>>> +      df_analyze ();
>>>> +      df_set_flags (DF_DEFER_INSN_RESCAN);
>>>
>>> But please don't do this!  For one thing, building DU/UD chains
>>> as well as rtl-ssa is really expensive in terms of compile time.
>>> But more importantly, modifications need to happen via rtl-ssa
>>> to ensure that the IL is kept up-to-date.  If we don't do that,
>>> later fuse attempts will be based on stale data and so could
>>> generate incorrect code.
>>>
>>
>> Sure I have made changes to use only rtl-ssa and not to use
>> UD/DU chains. I will send the changes in separate subsequent
>> patch.
> 
> Thanks.  Before you send the patch though:
> 
>>>> +// Check whether load can be fusable or not.
>>>> +// Return true if fuseable otherwise false.
>>>> +bool
>>>> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
>>>> +{
>>>> +  for (auto def : info->defs())
>>>> +    {
>>>> +      auto set = dyn_cast<set_info *> (def);
>>>> +      for (auto use1 : set->nondebug_insn_uses ())
>>>> +  use1->set_is_live_out_use (true);
>>>> +    }
>>>
>>> What was the reason for adding this loop?
>>>
>>
>> The purpose of adding is to avoid assert failure in 
>> gcc/rtl-ssa/changes.cc:252
> 
> That assert is making sure that we don't delete a definition of a
> register (or memory) while a real insn still uses it.  If the assert
> is firing then something has gone wrong.
> 
> Live-out uses are a particular kind of use that occur at the end of
> basic blocks.  It's incorrect to mark normal insn uses as live-out.
> 
> When an assert fails, it's important to understand why the failure
> occurs, rather than brute-force the assert condition to true.
>


The above assert failure occurs when there is a debug insn and its
use is not live-out.

>>>> [...]
>>>> +
>>>> +  rtx addr = XEXP (SET_SRC (body), 0);
>>>> +
>>>> +  if (GET_CODE (addr) == PLUS
>>>> +      && XEXP (addr, 1) && CONST_INT_P (XEXP (addr, 1)))
>>>> +    {
>>>> +      if (INTVAL (XEXP (addr, 1)) == -16)
>>>> +  return false;
>>>> +  }
>>>
>>> What's special about -16?
>>>
>>
>> The tests like libgomp/for-8 fails with fused load with offset -16 and 0.
>> Thats why I have added this check.
> 
> But why does it fail though?  It sounds like the testcase is pointing
> out a problem in the pass (or perhaps elsewhere).  It's important that
> we try to understand and fix the underlying problem.
> 

This check is not required anymore and will remove from subsequent patches.
>>>> +
>>>> +  df_ref use;
>>>> +  df_insn_info *insn_info = DF_INSN_INFO_GET (info->rtl ());
>>>> +  FOR_EACH_INSN_INFO_DEF (use, insn_info)
>>>> +    {
>>>> +      struct df_link *def_link = DF_REF_CHAIN (use);
>>>> +
>>>> +      if (!def_link || !def_link->ref
>>>> +    || DF_REF_IS_ARTIFICIAL (def_link->ref))
>>>> +  continue;
>>>> +
>>>> +      while (def_link && def_link->ref)
>>>> +  {
>>>> +    rtx_insn *insn = DF_REF_INSN (def_link->ref);
>>>> +    if (GET_CODE (PATTERN (insn)) == PARALLEL)
>>>> +      return false;
>>>
>>> Why do you need to skip PARALLELs?
>>>
>>
>> vec_select with parallel give failures final.cc "can't split-up with subreg 
>> 128 (reg OO"
>> Thats why I have added this.
> 
> But in (vec_select ... (parallel ...)), the parallel won't be the 
> PATTERN (insn).  It'll instead be a suboperand of the vec_select.
> 
> Here too it's important to understand why the final.cc failure occurs
> and what the correct fix is.
> 

subreg with vec_select operand already exists before fusion pass.
We overwrite them with subreg 128 bits from 256 OO mode operand.
Due to this in final.cc we couldnt splt at line 2807 and bails
out fatal_insn.

Currently we dont support already existing subreg vector operand
to generate register pairs.
We should bail out from fusion pass in this case.
>>>> +
>>>> +    rtx set = single_set (insn);
>>>> +    if (set == NULL_RTX)
>>>> +      return false;
>>>> +
>>>> +    rtx op0 = SET_SRC (set);
>>>> +    rtx_code code = GET_CODE (op0);
>>>> +
>>>> +    // This check is added as register pairs are not generated
>>>> +    // by RA for neg:V2DF (fma: V2DF (reg1)
>>>> +    //                  (reg2)
>>>> +    //                  (neg:V2DF (reg3)))
>>>> +    if (GET_RTX_CLASS (code) == RTX_UNARY)
>>>> +      return false;
>>>
>>> What's special about (neg (fma ...))?
>>>
>>
>> I am not sure why register allocator fails allocating register pairs with
>> NEG Unary operation with fma operand. I have not debugged register allocator 
>> why the NEG
>> Unary operation with fma operand. 
> 

For neg (fma ...) cases because of subreg 128 bits from OOmode 256 bits are
set correctly. 
IRA marked them spill candidates as spill priority is zero.

Due to this LRA reload pass couldn't allocate register pairs.
> I don't think it'll be specific to (neg (fma ...)).  Here too I think the
> test showed up a problem with the pass and we should try to understand
> and fix it.
> 
> More generally, it seems like you've run the testsuite, seen failures,
> isolated a particular change that was wrong in some way, and then added
> checks for that particular bit of input rtl.  But that isn't how the
> testsuite is meant to be used.
> 
> The code needs to make sense from first principles, with comments to
> explain decisions that are nonobvious.  (Well, to some extent anyway :))
> The testsuite exists to verify the code.  If a testsuite failure occurs,
> we need to understand what went wrong in the test, why exactly it happened,
> which part of the code was at fault, and how that code should be fixed.
> 
> In other words, getting clean test results isn't an end in itself.
> The testsuite is instead a tool for pointing out things that were
> overlooked.
> 
> Thanks,
> Richard

Thanks & Regards
Ajit

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Reply via email to