Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Ajit Agarwal Thu, 06 Jun 2024 06:54:04 -0700

Hello Richard:

On 06/06/24 2:28 pm, Richard Sandiford wrote:
> Hi,
> 
> Just some comments on the fuseable_load_p part, since that's what
> we were discussing last time.
> 
> It looks like this now relies on:
> 
> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>> +      /* We use DF data flow because we change location rtx
>> +     which is easier to find and modify.
>> +     We use mix of rtl-ssa def-use and DF data flow
>> +     where it is easier.  */
>> +      df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
>> +      df_analyze ();
>> +      df_set_flags (DF_DEFER_INSN_RESCAN);
> 
> But please don't do this!  For one thing, building DU/UD chains
> as well as rtl-ssa is really expensive in terms of compile time.
> But more importantly, modifications need to happen via rtl-ssa
> to ensure that the IL is kept up-to-date.  If we don't do that,
> later fuse attempts will be based on stale data and so could
> generate incorrect code.
>


Sure I have made changes to use only rtl-ssa and not to use
UD/DU chains. I will send the changes in separate subsequent
patch.

>> +// Check whether load can be fusable or not.
>> +// Return true if fuseable otherwise false.
>> +bool
>> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
>> +{
>> +  for (auto def : info->defs())
>> +    {
>> +      auto set = dyn_cast<set_info *> (def);
>> +      for (auto use1 : set->nondebug_insn_uses ())
>> +    use1->set_is_live_out_use (true);
>> +    }
> 
> What was the reason for adding this loop?
>

The purpose of adding is to avoid assert failure in gcc/rtl-ssa/changes.cc:252

 
>> +
>> +  rtx_insn *rtl_insn = info ->rtl ();
>> +  rtx body = PATTERN (rtl_insn);
>> +  rtx dest_exp = SET_DEST (body);
>> +
>> +  if (REG_P (dest_exp) &&
>> +      (DF_REG_DEF_COUNT (REGNO (dest_exp)) > 1
> 
> The rtl-ssa way of checking this is:
> 
>   crtl->ssa->is_single_dominating_def (...)
> 
>> +       || DF_REG_EQ_USE_COUNT (REGNO (dest_exp)) > 0))
>> +    return  false;
> 
> Why are uses in notes a problem?  In the worst case, we should just be
> able to remove the note instead.
>

We can remove this and its no more required. I will make this
change in subsequent patches.
 
>> +
>> +  rtx addr = XEXP (SET_SRC (body), 0);
>> +
>> +  if (GET_CODE (addr) == PLUS
>> +      && XEXP (addr, 1) && CONST_INT_P (XEXP (addr, 1)))
>> +    {
>> +      if (INTVAL (XEXP (addr, 1)) == -16)
>> +    return false;
>> +  }
> 
> What's special about -16?
> 

The tests like libgomp/for-8 fails with fused load with offset -16 and 0.
Thats why I have added this check.


>> +
>> +  df_ref use;
>> +  df_insn_info *insn_info = DF_INSN_INFO_GET (info->rtl ());
>> +  FOR_EACH_INSN_INFO_DEF (use, insn_info)
>> +    {
>> +      struct df_link *def_link = DF_REF_CHAIN (use);
>> +
>> +      if (!def_link || !def_link->ref
>> +      || DF_REF_IS_ARTIFICIAL (def_link->ref))
>> +    continue;
>> +
>> +      while (def_link && def_link->ref)
>> +    {
>> +      rtx_insn *insn = DF_REF_INSN (def_link->ref);
>> +      if (GET_CODE (PATTERN (insn)) == PARALLEL)
>> +        return false;
> 
> Why do you need to skip PARALLELs?
>

vec_select with parallel give failures final.cc "can't split-up with subreg 128 
(reg OO"
Thats why I have added this.

 
>> +
>> +      rtx set = single_set (insn);
>> +      if (set == NULL_RTX)
>> +        return false;
>> +
>> +      rtx op0 = SET_SRC (set);
>> +      rtx_code code = GET_CODE (op0);
>> +
>> +      // This check is added as register pairs are not generated
>> +      // by RA for neg:V2DF (fma: V2DF (reg1)
>> +      //                  (reg2)
>> +      //                  (neg:V2DF (reg3)))
>> +      if (GET_RTX_CLASS (code) == RTX_UNARY)
>> +        return false;
> 
> What's special about (neg (fma ...))?
>

I am not sure why register allocator fails allocating register pairs with
NEG Unary operation with fma operand. I have not debugged register allocator 
why the NEG
Unary operation with fma operand. 
 
>> +
>> +      def_link = def_link->next;
>> +    }
>> +     }
>> +  return true;
>> +}
> 
> Thanks,
> Richard

Thanks & Regards
Ajit

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Reply via email to