Richard Sandiford <richard.sandif...@arm.com> writes:
> Richard Biener <rguent...@suse.de> writes:
>> On Fri, 4 Apr 2025, Richard Sandiford wrote:
>>
>>> Another problem in PR101523 was that, after each successful 2->2
>>> combination attempt, distribute_links would search further and further
>>> for the next combinable use of the i2 destination.  Each search would
>>> start at i2 itself, making the search quadratic in the worst case.
>>> 
>>> In a 2->2 combination, if i2 is unchanged, the search can start at i3
>>> instead of i2.  The same thing applies to i2 when distributing i2's
>>> links, since the only changes to earlier instructions are the deletion
>>> of i0 and i1.
>>> 
>>> This change, combined with the previous split_i2i3 patch, gives a
>>> 34.6% speedup in combine for the testcase in PR101523.  Combine
>>> goes from being 41% to 34% of compile time.
>>
>> From my analysis this patch is OK (I do wonder why not always
>> starting from iN when distributing links for iN, but I guess
>> that combining into parallel & splitting can actually move
>> link sources/destinations in odd ways
>
> Right.  In particular, if we split part of i3 into i2, some log
> links can move up from i3 to i2.  I think it would be safe to
> pass iN unconditionally and then use LUIDs to check whether the

...I meant pass i2 unconditionally for i3links, on the basis that
we certainly don't need to search before i2.

> passed-in instruction is before or after the definition, picking
> the later of the two.  But that felt like too much of a rabiit hole.
>
> Richard
>
>
>> - I don't claim I fully understand this).
>>
>> Richard.
>>
>>> gcc/
>>>     PR testsuite/116398
>>>     * combine.cc (distribute_links): Take an optional start point.
>>>     (try_combine): If only i3 has changed, only distribute i3's links,
>>>     not i2's.  Start the search for the new use from i3 rather than
>>>     from the definition instruction.  Likewise start the search for
>>>     the new use from i2 when distributing i2's links.
>>> ---
>>>  gcc/combine.cc | 27 +++++++++++++++++++--------
>>>  1 file changed, 19 insertions(+), 8 deletions(-)
>>> 
>>> diff --git a/gcc/combine.cc b/gcc/combine.cc
>>> index 0664f8dd433..9eae1a0111e 100644
>>> --- a/gcc/combine.cc
>>> +++ b/gcc/combine.cc
>>> @@ -472,7 +472,7 @@ static void move_deaths (rtx, rtx, int, rtx_insn *, rtx 
>>> *);
>>>  static bool reg_bitfield_target_p (rtx, rtx);
>>>  static void distribute_notes (rtx, rtx_insn *, rtx_insn *, rtx_insn *,
>>>                           rtx, rtx, rtx);
>>> -static void distribute_links (struct insn_link *);
>>> +static void distribute_links (struct insn_link *, rtx_insn * = nullptr);
>>>  static void mark_used_regs_combine (rtx);
>>>  static void record_promoted_value (rtx_insn *, rtx);
>>>  static bool unmentioned_reg_p (rtx, rtx);
>>> @@ -4590,10 +4590,15 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn 
>>> *i1, rtx_insn *i0,
>>>                         NULL_RTX, NULL_RTX, NULL_RTX);
>>>        }
>>>  
>>> -    distribute_links (i3links);
>>> -    distribute_links (i2links);
>>> -    distribute_links (i1links);
>>> -    distribute_links (i0links);
>>> +    if (only_i3_changed)
>>> +      distribute_links (i3links, i3);
>>> +    else
>>> +      {
>>> +   distribute_links (i3links);
>>> +   distribute_links (i2links, i2);
>>> +   distribute_links (i1links);
>>> +   distribute_links (i0links);
>>> +      }
>>>  
>>>      if (REG_P (i2dest))
>>>        {
>>> @@ -14984,10 +14989,13 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
>>> rtx_insn *i3, rtx_insn *i2,
>>>  
>>>  /* Similarly to above, distribute the LOG_LINKS that used to be present on
>>>     I3, I2, and I1 to new locations.  This is also called to add a link
>>> -   pointing at I3 when I3's destination is changed.  */
>>> +   pointing at I3 when I3's destination is changed.
>>> +
>>> +   If START is nonnull and an insn, we know that the next location for each
>>> +   link is no earlier than START.  */
>>>  
>>>  static void
>>> -distribute_links (struct insn_link *links)
>>> +distribute_links (struct insn_link *links, rtx_insn *start)
>>>  {
>>>    struct insn_link *link, *next_link;
>>>  
>>> @@ -15053,7 +15061,10 @@ distribute_links (struct insn_link *links)
>>>      I3 to I2.  Also note that not much searching is typically done here
>>>      since most links don't point very far away.  */
>>>  
>>> -      for (insn = NEXT_INSN (link->insn);
>>> +      insn = start;
>>> +      if (!insn || NOTE_P (insn))
>>> +   insn = NEXT_INSN (link->insn);
>>> +      for (;
>>>        (insn && (this_basic_block->next_bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
>>>                  || BB_HEAD (this_basic_block->next_bb) != insn));
>>>        insn = NEXT_INSN (insn))
>>> 

Reply via email to