On Thu, Jun 2, 2022 at 12:20 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
>
> This middle-end patch proposes the "hard register constant propagation"
>
> pass be performed up to three times on each basic block (up from the
>
> current two times) if the second pass successfully made changes.
>
>
>
> The motivation for three passes is to handle the "swap idiom" (i.e.
>
> t = x; x = y; y = t;" sequences) that get generated by register allocation
>
> (reload).
>
>
>
> Consider the x86_64 test case for __int128 addition recently discussed
>
> on gcc-patches.  With that proposed patch, the input to the cprop_hardreg
>
> pass looks like:
>
>
>
>         movq    %rdi, %r8
>
>         movq    %rsi, %rdi
>
>         movq    %r8, %rsi
>
>         movq    %rdx, %rax
>
>         movq    %rcx, %rdx
>
>         addq    %rsi %rax
>
>         adcq    %rdi, %rdx
>
>         ret
>
>
>
> where the first three instructions effectively swap %rsi and %rdi.
>
>
>
> On the first pass of cprop_hardreg, we notice that the third insn,
>
> %rsi := %r8, is redundant and can eliminated/propagated to produce:
>
>
>
>         movq    %rdi, %r8
>
>         movq    %rsi, %rdi
>
>         movq    %rdx, %rax
>
>         movq    %rcx, %rdx
>
>         addq    %r8 %rax
>
>         adcq    %rdi, %rdx
>
>         ret
>
>
>
> Because a successful propagation was found, cprop_hardreg then runs
>
> a second pass/sweep on affected basic blocks (using worklist), and
>
> on this second pass notices that the second instruction, %rdi := %rsi,
>
> may now be propagated (%rsi was killed in the before the first transform),
>
> and after a second pass, we now end up with:
>
>
>
>         movq    %rdi, %r8
>
>         movq    %rdx, %rax
>
>         movq    %rcx, %rdx
>
>         addq    %r8, %rax
>
>         adcq    %rsi, %rdx
>
>         ret
>
>
>
> which is the current behaviour on mainline.  However, a third and final
>
> pass would now notice that the first insn, "%r8 := %rdi" is also now
>
> eliminable, and a third iteration would produce optimal code:
>
>
>
>         movq    %rdx, %rax
>
>         movq    %rcx, %rdx
>
>         addq    %rdi, %rax
>
>         adcq    %rsi, %rdx
>
>         ret
>
>
>
> The patch below creates an additional worklist, third_pass, that is
>
> populated with the basic block id's of blocks that were updated during
>
> the second pass.  Does the motivation for three passes (reload doesn't
>
> generate more or less than three instructions to swap a pair of registers)
>
> seem reasonable for all targets?  If cprop_hardreg is considered an
>
> expensive pass, this change could be gated based on basic block count
>
> or similar.  Finally, I should point out that this a regression fix;
>
> GCC 4.8 generated optimal code with two moves (whereas GCC 12 required
>
> 5 moves, up from GCC 11's 4 moves).
>
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>
> and make -k check, both with and without --target_board=unix{-m32} with
>
> no new failures.  Thoughts?  Ok for mainline?

Can you instead refactor the code to have

auto_vec<int> worklist1, worklist2;
vec<int> *worklist = &worklist1;

and alternate between worklist1 and worklist2 N times (const int N = 2
is OK for now I guess, OTOH for -O1 we might want to stick to 1).

it might be tempting to maintain the number of times we visited a block
and just directly iterate on changed BBs.  There's also the comment
in cprop_hardreg_bb:

  /* If a block has a single predecessor, that we've already
     processed, begin with the value data that was live at
     the end of the predecessor block.  */
  /* ??? Ought to use more intelligent queuing of blocks.  */

which suggests we want to improve on its dataflow (like iterating
in RPO order so we can intersect the live value data of predecessors?)
That said, calling df_analyze () between iterations is what makes
iterating expensive I guess, as well as cprop_hardreg_debug.

Richard.

>
>
>
>
> 2022-06-02  Roger Sayle  <ro...@nextmovesoftware.com>
>
>
>
> gcc/ChangeLog
>
>         * regcprop.cc (pass_cprop_hardreg::execute): Perform a third
>
>         iteration over each basic block that was updated by the second
>
>         iteration.
>
>
>
>
>
> Thanks in advance,
>
> Roger
>
> --
>
>
>

Reply via email to