On Thu, 3 Dec 2015, Tom de Vries wrote: > On 03/12/15 09:59, Richard Biener wrote: > > On Thu, 3 Dec 2015, Tom de Vries wrote: > > > > > On 03/12/15 01:10, Tom de Vries wrote: > > > > > > > > I've managed to reproduce it. The difference between pass and fail is > > > > whether the compiler is configured with or without accelerator. > > > > > > > > I'll look into it. > > > > > > In the configuration with accelerator, the flag node->force_output is on > > > for > > > foo._omp.fn. > > > > > > This causes nonlocal_p to be true in ipa_pta_execute, which causes the > > > optimization to fail. > > > > > > The flag is decribed as: > > > ... > > > /* The symbol will be assumed to be used in an invisible way (like > > > by an toplevel asm statement). */ > > > ... > > > > > > Looks like I have to ignore the force_output flag as well in > > > ipa_pta_execute > > > for this sort of node. > > > > It rather looks like the flag shouldn't be set. The fn after all has > > its address taken!(?) > > > > The flag is set here in expand_omp_target: > ... > 12682 /* Prevent IPA from removing child_fn as unreachable, > since there are no > 12683 refs from the parent function to child_fn in offload > LTO mode. */ > 12684 if (ENABLE_OFFLOADING) > 12685 cgraph_node::get (child_fn)->mark_force_output (); > ... >
How are there no refs from the "parent"? Are there not refs from some kind of descriptor that maps fallback CPU and offloaded variants? I think the above needs sorting out in somw way, making the refs explicit rather than implicit via force_output. > I guess setting forced_by_abi instead would also mean child_fn is not removed > as unreachable, while still allowing optimizations: > ... > /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol > to be exported. Unlike FORCE_OUTPUT this flag gets cleared to > symbols promoted to static and it does not inhibit > optimization. */ > unsigned forced_by_abi : 1; > ... > > But I suspect that other optimizations (than ipa-pta) might break things. How so? > Essentially we have two situations: > - in the host compiler, there is no need for the forced_output flag, > and it inhibits optimization > - in the accelerator compiler, it (or some equivalent) is needed > > I wonder if setting the force_output flag only when streaming the bytecode for > offloading would work. That way, it wouldn't be set in the host compiler, > while being set in the accelerator compiler. Yeah, that was my original thinking btw. Richard.