On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries <tom_devr...@mentor.com> wrote:
> On 16/07/15 10:44, Richard Biener wrote:
>>
>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <tom_devr...@mentor.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> I.
>>>
>>> In openmp expansion of loops, we do some effort to try to create matching
>>> loops in the loop state of the child function, f.i.in
>>> expand_omp_for_generic:
>>> ...
>>>        struct loop *outer_loop;
>>>        if (seq_loop)
>>>          outer_loop = l0_bb->loop_father;
>>>        else
>>>          {
>>>            outer_loop = alloc_loop ();
>>>            outer_loop->header = l0_bb;
>>>            outer_loop->latch = l2_bb;
>>>            add_loop (outer_loop, l0_bb->loop_father);
>>>          }
>>>
>>>        if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>          {
>>>            struct loop *loop = alloc_loop ();
>>>            loop->header = l1_bb;
>>>            /* The loop may have multiple latches.  */
>>>            add_loop (loop, outer_loop);
>>>          }
>>> ...
>>>
>>> And if that doesn't work out, we try to mark the loop state for fixup, in
>>> expand_omp_taskreg and expand_omp_target:
>>> ...
>>>        /* When the OMP expansion process cannot guarantee an up-to-date
>>>           loop tree arrange for the child function to fixup loops.  */
>>>        if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>          child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>> ...
>>>
>>> and expand_omp_for:
>>> ...
>>>    else
>>>      /* If there isn't a continue then this is a degerate case where
>>>         the introduction of abnormal edges during lowering will prevent
>>>         original loops from being detected.  Fix that up.  */
>>>      loops_state_set (LOOPS_NEED_FIXUP);
>>> ...
>>>
>>> However, loops are fixed up anyway, because the first pass we execute
>>> with
>>> the new child function is pass_fixup_cfg.
>>>
>>> The new child function contains a function call to
>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>> subsequently
>>> the loops with LOOPS_NEED_FIXUP.
>>>
>>>
>>> II.
>>>
>>> This patch adds a verification that at the end of the omp-expand
>>> processing
>>> of the child function, either the loop structure is ok, or marked for
>>> fixup.
>>>
>>> This verfication triggered a failure in parloops. When an outer loop is
>>> being parallelized, both the outer and inner loop are cancelled. Then
>>> during
>>> omp-expansion, we create a loop in the loop state for the outer loop (the
>>> one that is transformed), but not for the inner, which causes the
>>> verification failure:
>>> ...
>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>> ...
>>>
>>> [ I ran into this verification failure with an openacc kernels testcase
>>> on
>>> the gomp-4_0-branch, where parloops is called additionally from a
>>> different
>>> location, and pass_fixup_cfg is not the first pass that the child
>>> function
>>> is processed by. ]
>>>
>>> The patch contains a bit that makes sure that the loop state of the child
>>> function is marked for fixup in parloops. The bit is non-trival since it
>>> create a loop state and sets the fixup flag on the loop state, but
>>> postpones
>>> the init_loops_structure call till move_sese_region_to_fn, where it can
>>> succeed.
>>>
>>>
>>> III.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> OK for trunk?
>>
>>
>> +  struct loops *loops;
>> +  int loop_state_flags = 0;
>> +  if (dest_cfun->x_current_loops == NULL)
>> +    {
>> +      /* Initialize an empty loop tree.  */
>> +      loops = ggc_cleared_alloc<struct loops> ();
>> +      set_loops_for_fn (dest_cfun, loops);
>> +    }
>> +  else
>> +    {
>> +      loops = dest_cfun->x_current_loops;
>> +      loop_state_flags = loops->state;
>> +    }
>> +
>> +  if (loops->tree_root == NULL)
>> +    {
>> +      init_loops_structure (dest_cfun, loops, 1);
>> +      loops->state |= loop_state_flags;
>> +    }
>> +
>> +  loops->state |= LOOPS_MAY_HAVE_MULTIPLE_LATCHES;
>>
>> this looks twisted just because you do a half-way init of the loop tree
>> here:
>>
>> +  if (loop->inner)
>> +    {
>> +      struct function *child_cfun = DECL_STRUCT_FUNCTION (child_fn);
>> +      struct loops *loops = ggc_cleared_alloc<struct loops> ();
>> +      set_loops_for_fn (child_cfun, loops);
>> +      child_cfun->x_current_loops->state = LOOPS_NEED_FIXUP;
>> +    }
>>
>> why not unconditionally initialize the loop tree properly in autopar?
>>
>
> Because init_loops_structure accesses f.i. n_basic_blocks_for_fn
> (child_cfun), in other words child_cfun->cfg->x_n_basic_blocks. At this
> point in parloops, there's no child_cfun->cfg yet, that field is set by the
> following pass_expand_omp_ssa.

Well.  The above exposes too much internals about how this all works
(IMHO).

> Normally, the solution is to do loops_state_set (LOOPS_NEED_FIXUP) for the
> parent function, which will get propagated to the child. But I'm trying to
> be more precise than that, by only setting LOOPS_NEED_FIXUP for the child,
> not the parent.

Can we fix the root-cause of the issue instead?  That is, build a valid loop
structure in the first place?

Richard.

> Thanks,
> - Tom

Reply via email to