On Wed, Jul 29, 2015 at 1:38 PM, Tom de Vries <tom_devr...@mentor.com> wrote: > On 28/07/15 12:11, Richard Biener wrote: >> >> On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <tom_devr...@mentor.com> >> wrote: >>> >>> On 20/07/15 15:04, Tom de Vries wrote: >>>> >>>> >>>> On 16/07/15 12:15, Richard Biener wrote: >>>>> >>>>> >>>>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries >>>>> <tom_devr...@mentor.com> wrote: >>>>>> >>>>>> >>>>>> On 16/07/15 10:44, Richard Biener wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries >>>>>>> <tom_devr...@mentor.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I. >>>>>>>> >>>>>>>> In openmp expansion of loops, we do some effort to try to create >>>>>>>> matching >>>>>>>> loops in the loop state of the child function, f.i.in >>>>>>>> >>>>>>>> expand_omp_for_generic: >>>>>>>> ... >>>>>>>> struct loop *outer_loop; >>>>>>>> if (seq_loop) >>>>>>>> outer_loop = l0_bb->loop_father; >>>>>>>> else >>>>>>>> { >>>>>>>> outer_loop = alloc_loop (); >>>>>>>> outer_loop->header = l0_bb; >>>>>>>> outer_loop->latch = l2_bb; >>>>>>>> add_loop (outer_loop, l0_bb->loop_father); >>>>>>>> } >>>>>>>> >>>>>>>> if (!gimple_omp_for_combined_p (fd->for_stmt)) >>>>>>>> { >>>>>>>> struct loop *loop = alloc_loop (); >>>>>>>> loop->header = l1_bb; >>>>>>>> /* The loop may have multiple latches. */ >>>>>>>> add_loop (loop, outer_loop); >>>>>>>> } >>>>>>>> ... >>>>>>>> >>>>>>>> And if that doesn't work out, we try to mark the loop state for >>>>>>>> fixup, in >>>>>>>> expand_omp_taskreg and expand_omp_target: >>>>>>>> ... >>>>>>>> /* When the OMP expansion process cannot guarantee an >>>>>>>> up-to-date >>>>>>>> loop tree arrange for the child function to fixup >>>>>>>> loops. */ >>>>>>>> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP)) >>>>>>>> child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP; >>>>>>>> ... >>>>>>>> >>>>>>>> and expand_omp_for: >>>>>>>> ... >>>>>>>> else >>>>>>>> /* If there isn't a continue then this is a degerate case >>>>>>>> where >>>>>>>> the introduction of abnormal edges during lowering will >>>>>>>> prevent >>>>>>>> original loops from being detected. Fix that up. */ >>>>>>>> loops_state_set (LOOPS_NEED_FIXUP); >>>>>>>> ... >>>>>>>> >>>>>>>> However, loops are fixed up anyway, because the first pass we >>>>>>>> execute >>>>>>>> with >>>>>>>> the new child function is pass_fixup_cfg. >>>>>>>> >>>>>>>> The new child function contains a function call to >>>>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so >>>>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and >>>>>>>> subsequently >>>>>>>> the loops with LOOPS_NEED_FIXUP. >>>>>>>> >>>>>>>> >>>>>>>> II. >>>>>>>> >>>>>>>> This patch adds a verification that at the end of the omp-expand >>>>>>>> processing >>>>>>>> of the child function, either the loop structure is ok, or marked >>>>>>>> for >>>>>>>> fixup. >>>>>>>> >>>>>>>> This verfication triggered a failure in parloops. When an outer >>>>>>>> loop is >>>>>>>> being parallelized, both the outer and inner loop are cancelled. >>>>>>>> Then >>>>>>>> during >>>>>>>> omp-expansion, we create a loop in the loop state for the outer >>>>>>>> loop (the >>>>>>>> one that is transformed), but not for the inner, which causes the >>>>>>>> verification failure: >>>>>>>> ... >>>>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree >>>>>>>> ... >>>>>>>> >>>>>>>> [ I ran into this verification failure with an openacc kernels >>>>>>>> testcase >>>>>>>> on >>>>>>>> the gomp-4_0-branch, where parloops is called additionally from a >>>>>>>> different >>>>>>>> location, and pass_fixup_cfg is not the first pass that the child >>>>>>>> function >>>>>>>> is processed by. ] >>>>>>>> >>>>>>>> The patch contains a bit that makes sure that the loop state of the >>>>>>>> child >>>>>>>> function is marked for fixup in parloops. The bit is non-trival >>>>>>>> since it >>>>>>>> create a loop state and sets the fixup flag on the loop state, but >>>>>>>> postpones >>>>>>>> the init_loops_structure call till move_sese_region_to_fn, where it >>>>>>>> can >>>>>>>> succeed. >>>>>>>> >>>>>>>> >>>> >>>> <SNIP> >>>> >>>>> Can we fix the root-cause of the issue instead? That is, build a >>>>> valid loop >>>>> structure in the first place? >>>>> >>>> >>>> This patch manages to keep the loop structure, that is, to not cancel >>>> the loop tree in parloops, and guarantee a valid loop structure at the >>>> end of parloops. >>>> >>>> The transformation to insert the omp_for invalidates the loop state >>>> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so >>>> we drop those in parloops. >>>> >>>> In expand_omp_for_static_nochunk, we detect the existing loop struct of >>>> the omp_for, and keep it. >>>> >>>> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get >>>> the loop state properties LOOPS_HAVE_RECORDED_EXITS and >>>> LOOPS_HAVE_SIMPLE_LATCHES back. >>>> >>> >>> This updated patch tries a more minimal approach. >>> >>> Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the >>> new >>> exit instead. >>> >>> And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we >>> just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of >>> pass_expand_omp_ssa. >>> >>> Bootstrapped and reg-tested on x86_64. >>> >>> OK for trunk? >> >> >> I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp. >> turning >> that back on in execute_expand_omp). The parloops code lacks comments and >> the /* Prepare cfg. */ part looks twisty to me - but I don't see why >> it should be difficult >> to preserve simple latches as well - is this just because we insert >> the GIMPLE_OMP_CONTINUE >> in it? >> > > It's not difficult to do. It's just that the omp-for that is generated via > the normal route (omp-annotated source code) doesn't have this simple latch, > and parloops just mimics that cfg shape. > > Attached updated patch preserves simple latches in parloops. We need some > minor adjustments in omp-expand to handle that. > >> If execute_expand_omp is not performed in a loop pipeline where things >> expect simple latches >> (well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here >> at all? If somebody >> needs it it will just request simple latches. >> > > I've looked into this. The (first?) pass that gives me trouble is > pass_iv_optimize. It doesn't request simple latches, but it does need them.
Well, pass_iv_optimize is inside the loop pass which ensures we have LOOPS_NORMAL. Passes in it rely on that. >> So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep >> LOOPS_HAVE_SIMPLE_LATCHES >> unset in execute_expand_omp. >> >> Ok with that change. >> > > OK with this approach of keeping LOOPS_HAVE_SIMPLE_LATCHES in parloops (if > bootstrap and reg-test succeeds)? Yes. Thanks, Richard. > Thanks, > - Tom >