On Mon, Mar 19, 2018 at 5:08 PM, Aldy Hernandez <al...@redhat.com> wrote:
> Hi Richard.
>
> As discussed in the PR, the problem here is that we have two different
> iterations of an IV live outside of a loop.  This inhibits us from using
> autoinc/dec addressing on ARM, and causes extra lea's on x86.
>
> An abbreviated example is this:
>
> loop:
>   # p_9 = PHI <p_17(2), p_20(3)>
>   p_20 = p_9 + 18446744073709551615;
> goto loop
>   p_24 = p_9 + 18446744073709551614;
>   MEM[(char *)p_20 + -1B] = 45;
>
> Here we have both the previous IV (p_9) and the current IV (p_20) used
> outside of the loop.  On Arm this keeps us from using auto-dec addressing,
> because one use is -2 and the other one is -1.
>
> With the attached patch we attempt to rewrite out-of-loop uses of the IV in
> terms of the current/last IV (p_20 in the case above).  With it, we end up
> with:
>
>   p_24 = p_20 + 18446744073709551615;
>   *p_24 = 45;
>
> ...which helps both x86 and Arm.
>
> As you have suggested in comment 38 on the PR, I handle specially
> out-of-loop IV uses of the form IV+CST and propagate those accordingly
> (along with the MEM_REF above).  Otherwise, in less specific cases, we un-do
> the IV increment, and use that value in all out-of-loop uses.  For instance,
> in the attached testcase, we rewrite:
>
>   george (p_9);
>
> into
>
>   _26 = p_20 + 1;
>   ...
>   george (_26);
>
> The attached testcase tests the IV+CST specific case, as well as the more
> generic case with george().
>
> Although the original PR was for ARM, this behavior can be noticed on x86,
> so I tested on x86 with a full bootstrap + tests.  I also ran the specific
> test on an x86 cross ARM build and made sure we had 2 auto-dec with the
> test.  For the original test (slightly different than the testcase in this
> patch), with this patch we are at 104 bytes versus 116 without it.  There is
> still the issue of a division optimization which would further reduce the
> code size.  I will discuss this separately as it is independent from this
> patch.
>
> Oh yeah, we could make this more generic, and maybe handle any multiple of
> the constant, or perhaps *= and /=.  Perhaps something for next stage1...
>
> OK for trunk?
Just FYI, this looks similar to what I did in
https://gcc.gnu.org/ml/gcc-patches/2013-11/msg00535.html
That change was non-trivial and didn't give obvious improvement back
in time.  But I still wonder if this
can be done at rewriting iv_use in a light-overhead way.

Thanks,
bin
> Aldy

Reply via email to