Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Richard Sandiford Thu, 06 Jun 2024 07:33:59 -0700

Ajit Agarwal <aagar...@linux.ibm.com> writes:
> On 06/06/24 2:28 pm, Richard Sandiford wrote:
>> Hi,
>> 
>> Just some comments on the fuseable_load_p part, since that's what
>> we were discussing last time.
>> 
>> It looks like this now relies on:
>> 
>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>> +      /* We use DF data flow because we change location rtx
>>> +    which is easier to find and modify.
>>> +    We use mix of rtl-ssa def-use and DF data flow
>>> +    where it is easier.  */
>>> +      df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
>>> +      df_analyze ();
>>> +      df_set_flags (DF_DEFER_INSN_RESCAN);
>> 
>> But please don't do this!  For one thing, building DU/UD chains
>> as well as rtl-ssa is really expensive in terms of compile time.
>> But more importantly, modifications need to happen via rtl-ssa
>> to ensure that the IL is kept up-to-date.  If we don't do that,
>> later fuse attempts will be based on stale data and so could
>> generate incorrect code.
>> 
>
> Sure I have made changes to use only rtl-ssa and not to use
> UD/DU chains. I will send the changes in separate subsequent
> patch.


Thanks.  Before you send the patch though:

>>> +// Check whether load can be fusable or not.
>>> +// Return true if fuseable otherwise false.
>>> +bool
>>> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
>>> +{
>>> +  for (auto def : info->defs())
>>> +    {
>>> +      auto set = dyn_cast<set_info *> (def);
>>> +      for (auto use1 : set->nondebug_insn_uses ())
>>> +   use1->set_is_live_out_use (true);
>>> +    }
>> 
>> What was the reason for adding this loop?
>>
>
> The purpose of adding is to avoid assert failure in gcc/rtl-ssa/changes.cc:252

That assert is making sure that we don't delete a definition of a
register (or memory) while a real insn still uses it.  If the assert
is firing then something has gone wrong.

Live-out uses are a particular kind of use that occur at the end of
basic blocks.  It's incorrect to mark normal insn uses as live-out.

When an assert fails, it's important to understand why the failure
occurs, rather than brute-force the assert condition to true.

>>> [...]
>>> +
>>> +  rtx addr = XEXP (SET_SRC (body), 0);
>>> +
>>> +  if (GET_CODE (addr) == PLUS
>>> +      && XEXP (addr, 1) && CONST_INT_P (XEXP (addr, 1)))
>>> +    {
>>> +      if (INTVAL (XEXP (addr, 1)) == -16)
>>> +   return false;
>>> +  }
>> 
>> What's special about -16?
>> 
>
> The tests like libgomp/for-8 fails with fused load with offset -16 and 0.
> Thats why I have added this check.

But why does it fail though?  It sounds like the testcase is pointing
out a problem in the pass (or perhaps elsewhere).  It's important that
we try to understand and fix the underlying problem.

>>> +
>>> +  df_ref use;
>>> +  df_insn_info *insn_info = DF_INSN_INFO_GET (info->rtl ());
>>> +  FOR_EACH_INSN_INFO_DEF (use, insn_info)
>>> +    {
>>> +      struct df_link *def_link = DF_REF_CHAIN (use);
>>> +
>>> +      if (!def_link || !def_link->ref
>>> +     || DF_REF_IS_ARTIFICIAL (def_link->ref))
>>> +   continue;
>>> +
>>> +      while (def_link && def_link->ref)
>>> +   {
>>> +     rtx_insn *insn = DF_REF_INSN (def_link->ref);
>>> +     if (GET_CODE (PATTERN (insn)) == PARALLEL)
>>> +       return false;
>> 
>> Why do you need to skip PARALLELs?
>>
>
> vec_select with parallel give failures final.cc "can't split-up with subreg 
> 128 (reg OO"
> Thats why I have added this.

But in (vec_select ... (parallel ...)), the parallel won't be the 
PATTERN (insn).  It'll instead be a suboperand of the vec_select.

Here too it's important to understand why the final.cc failure occurs
and what the correct fix is.

>>> +
>>> +     rtx set = single_set (insn);
>>> +     if (set == NULL_RTX)
>>> +       return false;
>>> +
>>> +     rtx op0 = SET_SRC (set);
>>> +     rtx_code code = GET_CODE (op0);
>>> +
>>> +     // This check is added as register pairs are not generated
>>> +     // by RA for neg:V2DF (fma: V2DF (reg1)
>>> +     //                  (reg2)
>>> +     //                  (neg:V2DF (reg3)))
>>> +     if (GET_RTX_CLASS (code) == RTX_UNARY)
>>> +       return false;
>> 
>> What's special about (neg (fma ...))?
>>
>
> I am not sure why register allocator fails allocating register pairs with
> NEG Unary operation with fma operand. I have not debugged register allocator 
> why the NEG
> Unary operation with fma operand. 

I don't think it'll be specific to (neg (fma ...)).  Here too I think the
test showed up a problem with the pass and we should try to understand
and fix it.

More generally, it seems like you've run the testsuite, seen failures,
isolated a particular change that was wrong in some way, and then added
checks for that particular bit of input rtl.  But that isn't how the
testsuite is meant to be used.

The code needs to make sense from first principles, with comments to
explain decisions that are nonobvious.  (Well, to some extent anyway :))
The testsuite exists to verify the code.  If a testsuite failure occurs,
we need to understand what went wrong in the test, why exactly it happened,
which part of the code was at fault, and how that code should be fixed.

In other words, getting clean test results isn't an end in itself.
The testsuite is instead a tool for pointing out things that were
overlooked.

Thanks,
Richard

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Reply via email to