On 2021/8/17 17:10, Xionghu Luo via Gcc-patches wrote:
>
>
> On 2021/8/17 15:12, Richard Biener wrote:
>> On Tue, 17 Aug 2021, Xionghu Luo wrote:
>>
>>> Hi,
>>>
>>> On 2021/8/16 19:46, Richard Biener wrote:
>>>> On Mon, 16 Aug 2021, Xiong Hu Luo wrote:
>>>>
>>>>> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
>>>>> nested loops. inn_loop is updated to inner loop, so it need be restored
>>>>> when exiting from innermost loop. With this patch, the store instruction
>>>>> in outer loop could also be moved out of outer loop by store motion.
>>>>> Any comments? Thanks.
>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> * tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
>>>>> inn_loop when exiting from innermost loop.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
>>>>> ---
>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++
>>>>> gcc/tree-ssa-loop-im.c | 6 +++++-
>>>>> 2 files changed, 29 insertions(+), 1 deletion(-)
>>>>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>>>>
>>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>>>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>>>> new file mode 100644
>>>>> index 00000000000..097a5ee4a4b
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>>>> @@ -0,0 +1,24 @@
>>>>> +/* PR/101293 */
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
>>>>> +
>>>>> +struct X { int i; int j; int k;};
>>>>> +
>>>>> +void foo(struct X *x, int n, int l)
>>>>> +{
>>>>> + for (int j = 0; j < l; j++)
>>>>> + {
>>>>> + for (int i = 0; i < n; ++i)
>>>>> + {
>>>>> + int *p = &x->j;
>>>>> + int tem = *p;
>>>>> + x->j += tem * i;
>>>>> + }
>>>>> + int *r = &x->k;
>>>>> + int tem2 = *r;
>>>>> + x->k += tem2 * j;
>>>>> + }
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" }
>>>>> }
>>>>> */
>>>>> +
>>>>> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
>>>>> index b24bc64f2a7..5ca4738b20e 100644
>>>>> --- a/gcc/tree-ssa-loop-im.c
>>>>> +++ b/gcc/tree-ssa-loop-im.c
>>>>> @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop,
>>>>> sbitmap
>>>>> @@ contains_call)
>>>>> if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>>>>> last = bb;
>>>>> + if (inn_loop != loop
>>>>> + && flow_loop_nested_p (bb->loop_father, inn_loop))
>>>>> + inn_loop = bb->loop_father;
>>>>> +
>>>>
>>>> The comment says
>>>>
>>>> /* In a loop that is always entered we may proceed anyway.
>>>> But record that we entered it and stop once we leave
>>>> it.
>>>> */
>>>> inn_loop = bb->loop_father;
>>>>
>>>> and your change would defeat that early return, no?
>>>
>>> The issue is the search method exits too early when iterating the outer
>>> loop. For example of a nested loop, loop 1 includes 5,8,3,10,4,9
>>> and loop2 includes 3,10. Currently, it breaks when bb is 3 as bb 3
>>> doesn't dominate bb 9 of loop 1. But actually, both bb 5 and bb 4 are
>>> ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4
>>> they won't be processed by store motion again.
>>>
>>>
>>> 5<----
>>> |\ |
>>> 8 \ 9
>>> | \ |
>>> --->3--->4
>>> | |
>>> 10---|
>>>
>>>
>>> SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this
>>> patch, it will continue search when meet bb 3 until bb 4, then last is
>>> updated
>>> to bb 4, it will break until exit edge is found at bb 4 by
>>> "if (!flow_bb_inside_loop_p (loop, e->dest))". Then the followed loop code
>>> will
>>> set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5.
>>>
>>>
>>> while (1)
>>> {
>>> SET_ALWAYS_EXECUTED_IN (last, loop);
>>> if (last == loop->header)
>>> break;
>>> last = get_immediate_dominator (CDI_DOMINATORS, last);
>>> }
>>>
>>> After further discussion with Kewen, we found that the inn_loop variable is
>>> totally useless and could be removed.
>>>
>>>
>>>>
>>>>> if (bitmap_bit_p (contains_call, bb->index))
>>>>> break;
>>>>>
>>>>> @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, sbitmap
>>>>> @@ contains_call)
>>>>>
>>>>> if (bb->loop_father->header == bb)
>>>>> {
>>>>> - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>>>>> + if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch,
>>>>> bb))
>>>>> break;
>>>>
>>>> That's now a always false condition - a loops latch is always dominated
>>>> by its header. The condition as written tries to verify whether the
>>>> loop is always entered - mind we visit all blocks, not only those
>>>> always executed.
>>>
>>> Thanks for the catch! I am afraid the piece of code should be removed since
>>> it stops
>>> search of potential ALWAYS EXECUTED bb after inner loop...
>>
>> But the code says:
>>
>> /* In a loop that is always entered we may proceed anyway.
>> But record that we entered it and stop once we leave it.
>> */
>>
>> and you do not remove this comment still it doesn't hold anymore
>> after your patch. I don't say the current code is optimal - I just
>> say it does exactly what is documented.
>
> Removed.
>
>>
>>
>>>>
>>>> In fact for your testcase the x->j ref is _not_ always executed
>>>> since the inner loop is conditional on n > 0.
>>>
>>> Yes. But I want to move x->k (not x->j) out of loop 1 when l > 0 in
>>> store-motion.
>>
>> Sorry, that wasn't clear. Yes, I agree the code fails to see always
>> executed blocks after inner loops. But since the code simply walks
>> all blocks in a loop instead of greedily following edges it has
>> to do that since the inner loop might exit the outer loop as well,
>> sth your change wouldn't honor. Consider
>
> This is already handled now, if there is an edge exiting out of outer
> loop directly, it will also early break, I've verified it with your case.
>
>
> FOR_EACH_EDGE (e, ei, bb->succs)
> {
> /* If there is an exit from this BB. */
> if (!flow_bb_inside_loop_p (loop, e->dest))
> break;
>>
>> while (--n)
>> {
>> do
>> {
>> if (do_exit)
>> goto out;
>> }
>> while (1);
>> p->x += 1;
>> }
>> out:;
>>
>> you'll see a CFG where the inner loop exits the outer loop as well.
>>
>> So I'm saying to improve the code you'll likely have to do more.
>> And add a testcase like the following
>>
>> void __attribute__((noipa))
>> foo (int n, int m, int f, int *p, int *q)
>> {
>> while (--n)
>> {
>> do
>> {
>> *q += 1;
>> if (f)
>> goto out;
>> }
>> while (--m);
>> *p += 1;
>> }
>> out:;
>> }
>>
>> int main()
>> {
>> int i = 0;
>> foo (10, 10, 1, (void *)0, &i);
>> if (i != 1)
>> __builtin_abort ();
>> return 0;
>> }
>>
>> Richard.
>>
>
> Updated the patch:
>
>
> [PATCH v2] Fix incomplete computation in fill_always_executed_in_1
>
> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
> nested loops. Current design will exit if an inner loop doesn't
> dominate outer loop's latch or exit after exiting from inner loop, which
> caused early return from outer loop, then ALWAYS EXECUTED blocks after
> inner loops is skipped.
> This patch removes the redundant inner loop check, but keep the check of
> exiting to out of outer loop from inner loop, then the store instruction
> in outer loop could also be moved out of outer loop by store motion.
This patch is regression tested pass on Power, and shows NO SPEC2017
performance change.
>
> gcc/ChangeLog:
>
> * tree-ssa-loop-im.c (fill_always_executed_in_1): Remove
> inn_loop check.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
> * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
> ---
> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 +++++++++++++++++
> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 30 ++++++++++++++++++++++
> gcc/tree-ssa-loop-im.c | 14 ----------
> 3 files changed, 54 insertions(+), 14 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> new file mode 100644
> index 00000000000..097a5ee4a4b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> @@ -0,0 +1,24 @@
> +/* PR/101293 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> +
> +struct X { int i; int j; int k;};
> +
> +void foo(struct X *x, int n, int l)
> +{
> + for (int j = 0; j < l; j++)
> + {
> + for (int i = 0; i < n; ++i)
> + {
> + int *p = &x->j;
> + int tem = *p;
> + x->j += tem * i;
> + }
> + int *r = &x->k;
> + int tem2 = *r;
> + x->k += tem2 * j;
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
> new file mode 100644
> index 00000000000..6e180de528e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
> @@ -0,0 +1,30 @@
> +/* PR/101293 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> +
> +void __attribute__ ((noipa)) foo (int n, int m, int f, int *p, int *q)
> +{
> + while (--n)
> + {
> + do
> + {
> + *q += 1;
> + if (f)
> + goto out;
> + }
> + while (--m);
> + *p += 1;
> + }
> +out:;
> +}
> +
> +int main()
> +{
> + int i = 0;
> + foo (10, 10, 1, (void *) 0, &i);
> + if (i != 1)
> + __builtin_abort ();
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Executing store motion" 1 "lim2" } } */
> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> index b24bc64f2a7..c44f4390ce2 100644
> --- a/gcc/tree-ssa-loop-im.c
> +++ b/gcc/tree-ssa-loop-im.c
> @@ -3197,7 +3197,6 @@ fill_always_executed_in_1 (class loop *loop, sbitmap
> contains_call)
> basic_block bb = NULL, *bbs, last = NULL;
> unsigned i;
> edge e;
> - class loop *inn_loop = loop;
>
> if (ALWAYS_EXECUTED_IN (loop->header) == NULL)
> {
> @@ -3232,19 +3231,6 @@ fill_always_executed_in_1 (class loop *loop, sbitmap
> contains_call)
> to disprove this if possible). */
> if (bb->flags & BB_IRREDUCIBLE_LOOP)
> break;
> -
> - if (!flow_bb_inside_loop_p (inn_loop, bb))
> - break;
> -
> - if (bb->loop_father->header == bb)
> - {
> - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> - break;
> -
> - /* In a loop that is always entered we may proceed anyway.
> - But record that we entered it and stop once we leave it. */
> - inn_loop = bb->loop_father;
> - }
> }
>
> while (1)
>
--
Thanks,
Xionghu