On Wed, Jul 29, 2015 at 1:38 PM, Tom de Vries <tom_devr...@mentor.com> wrote:
> On 28/07/15 12:11, Richard Biener wrote:
>>
>> On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <tom_devr...@mentor.com>
>> wrote:
>>>
>>> On 20/07/15 15:04, Tom de Vries wrote:
>>>>
>>>>
>>>> On 16/07/15 12:15, Richard Biener wrote:
>>>>>
>>>>>
>>>>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>>>>> <tom_devr...@mentor.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 16/07/15 10:44, Richard Biener wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries
>>>>>>> <tom_devr...@mentor.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I.
>>>>>>>>
>>>>>>>> In openmp expansion of loops, we do some effort to try to create
>>>>>>>> matching
>>>>>>>> loops in the loop state of the child function, f.i.in
>>>>>>>>
>>>>>>>> expand_omp_for_generic:
>>>>>>>> ...
>>>>>>>>          struct loop *outer_loop;
>>>>>>>>          if (seq_loop)
>>>>>>>>            outer_loop = l0_bb->loop_father;
>>>>>>>>          else
>>>>>>>>            {
>>>>>>>>              outer_loop = alloc_loop ();
>>>>>>>>              outer_loop->header = l0_bb;
>>>>>>>>              outer_loop->latch = l2_bb;
>>>>>>>>              add_loop (outer_loop, l0_bb->loop_father);
>>>>>>>>            }
>>>>>>>>
>>>>>>>>          if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>>>>>            {
>>>>>>>>              struct loop *loop = alloc_loop ();
>>>>>>>>              loop->header = l1_bb;
>>>>>>>>              /* The loop may have multiple latches.  */
>>>>>>>>              add_loop (loop, outer_loop);
>>>>>>>>            }
>>>>>>>> ...
>>>>>>>>
>>>>>>>> And if that doesn't work out, we try to mark the loop state for
>>>>>>>> fixup, in
>>>>>>>> expand_omp_taskreg and expand_omp_target:
>>>>>>>> ...
>>>>>>>>          /* When the OMP expansion process cannot guarantee an
>>>>>>>> up-to-date
>>>>>>>>             loop tree arrange for the child function to fixup
>>>>>>>> loops.  */
>>>>>>>>          if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>>>>>            child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>>>>>> ...
>>>>>>>>
>>>>>>>> and expand_omp_for:
>>>>>>>> ...
>>>>>>>>      else
>>>>>>>>        /* If there isn't a continue then this is a degerate case
>>>>>>>> where
>>>>>>>>           the introduction of abnormal edges during lowering will
>>>>>>>> prevent
>>>>>>>>           original loops from being detected.  Fix that up.  */
>>>>>>>>        loops_state_set (LOOPS_NEED_FIXUP);
>>>>>>>> ...
>>>>>>>>
>>>>>>>> However, loops are fixed up anyway, because the first pass we
>>>>>>>> execute
>>>>>>>> with
>>>>>>>> the new child function is pass_fixup_cfg.
>>>>>>>>
>>>>>>>> The new child function contains a function call to
>>>>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>>>>>> subsequently
>>>>>>>> the loops with LOOPS_NEED_FIXUP.
>>>>>>>>
>>>>>>>>
>>>>>>>> II.
>>>>>>>>
>>>>>>>> This patch adds a verification that at the end of the omp-expand
>>>>>>>> processing
>>>>>>>> of the child function, either the loop structure is ok, or marked
>>>>>>>> for
>>>>>>>> fixup.
>>>>>>>>
>>>>>>>> This verfication triggered a failure in parloops. When an outer
>>>>>>>> loop is
>>>>>>>> being parallelized, both the outer and inner loop are cancelled.
>>>>>>>> Then
>>>>>>>> during
>>>>>>>> omp-expansion, we create a loop in the loop state for the outer
>>>>>>>> loop (the
>>>>>>>> one that is transformed), but not for the inner, which causes the
>>>>>>>> verification failure:
>>>>>>>> ...
>>>>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>>>>>> ...
>>>>>>>>
>>>>>>>> [ I ran into this verification failure with an openacc kernels
>>>>>>>> testcase
>>>>>>>> on
>>>>>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>>>>>> different
>>>>>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>>>>>> function
>>>>>>>> is processed by. ]
>>>>>>>>
>>>>>>>> The patch contains a bit that makes sure that the loop state of the
>>>>>>>> child
>>>>>>>> function is marked for fixup in parloops. The bit is non-trival
>>>>>>>> since it
>>>>>>>> create a loop state and sets the fixup flag on the loop state, but
>>>>>>>> postpones
>>>>>>>> the init_loops_structure call till move_sese_region_to_fn, where it
>>>>>>>> can
>>>>>>>> succeed.
>>>>>>>>
>>>>>>>>
>>>>
>>>> <SNIP>
>>>>
>>>>> Can we fix the root-cause of the issue instead?  That is, build a
>>>>> valid loop
>>>>> structure in the first place?
>>>>>
>>>>
>>>> This patch manages to keep the loop structure, that is, to not cancel
>>>> the loop tree in parloops, and guarantee a valid loop structure at the
>>>> end of parloops.
>>>>
>>>> The transformation to insert the omp_for invalidates the loop state
>>>> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
>>>> we drop those in parloops.
>>>>
>>>> In expand_omp_for_static_nochunk, we detect the existing loop struct of
>>>> the omp_for, and keep it.
>>>>
>>>> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
>>>> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
>>>> LOOPS_HAVE_SIMPLE_LATCHES back.
>>>>
>>>
>>> This updated patch tries a more minimal approach.
>>>
>>> Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the
>>> new
>>> exit instead.
>>>
>>> And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
>>> just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of
>>> pass_expand_omp_ssa.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> OK for trunk?
>>
>>
>> I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp.
>> turning
>> that back on in execute_expand_omp).  The parloops code lacks comments and
>> the /* Prepare cfg.  */ part looks twisty to me - but I don't see why
>> it should be difficult
>> to preserve simple latches as well - is this just because we insert
>> the GIMPLE_OMP_CONTINUE
>> in it?
>>
>
> It's not difficult to do. It's just that the omp-for that is generated via
> the normal route (omp-annotated source code) doesn't have this simple latch,
> and parloops just mimics that cfg shape.
>
> Attached updated patch preserves simple latches in parloops. We need some
> minor adjustments in omp-expand to handle that.
>
>> If execute_expand_omp is not performed in a loop pipeline where things
>> expect simple latches
>> (well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here
>> at all?  If somebody
>> needs it it will just request simple latches.
>>
>
> I've looked into this. The (first?) pass that gives me trouble is
> pass_iv_optimize. It doesn't request simple latches, but it does need them.

Well, pass_iv_optimize is inside the loop pass which ensures we have
LOOPS_NORMAL.
Passes in it rely on that.

>> So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep
>> LOOPS_HAVE_SIMPLE_LATCHES
>> unset in execute_expand_omp.
>>
>> Ok with that change.
>>
>
> OK with this approach of keeping LOOPS_HAVE_SIMPLE_LATCHES in parloops (if
> bootstrap and reg-test succeeds)?

Yes.

Thanks,
Richard.

> Thanks,
> - Tom
>

Reply via email to