On Fri, Oct 14, 2022 at 11:27:07AM +0000, Richard Biener wrote: > > --- gcc/function.h.jj 2022-10-10 11:57:40.163722972 +0200 > > +++ gcc/function.h 2022-10-12 19:48:28.887554771 +0200 > > @@ -438,6 +438,10 @@ struct GTY(()) function { > > > > /* Set if there are any OMP_TARGET regions in the function. */ > > unsigned int has_omp_target : 1; > > + > > + /* Set for artificial function created for [[assume (cond)]]. > > + These should be GIMPLE optimized, but not expanded to RTL. */ > > + unsigned int assume_function : 1; > > I wonder if we should have this along force_output in the symtab > node and let the symtab code decide whether to expand?
I actually first had a flag on the symtab node but as the patch shows, when it needs to be tested, more frequently I have access to struct function than to cgraph node. > > --- gcc/gimplify.cc.jj 2022-10-10 11:57:40.165722944 +0200 > > +++ gcc/gimplify.cc 2022-10-12 19:48:28.890554730 +0200 > > @@ -3569,7 +3569,52 @@ gimplify_call_expr (tree *expr_p, gimple > > fndecl, 0)); > > return GS_OK; > > } > > - /* FIXME: Otherwise expand it specially. */ > > + /* If not optimizing, ignore the assumptions. */ > > + if (!optimize) > > + { > > + *expr_p = NULL_TREE; > > + return GS_ALL_DONE; > > + } > > + /* Temporarily, until gimple lowering, transform > > + .ASSUME (cond); > > + into: > > + guard = .ASSUME (); > > + if (guard) goto label_true; else label_false; > > + label_true:; > > + { > > + guard = cond; > > + } > > + label_false:; > > + .ASSUME (guard); > > + such that gimple lowering can outline the condition into > > + a separate function easily. */ > > So the idea to use lambdas and/or nested functions (for OMP) > didn't work out or is more complicated? Yes, that didn't work out. Both lambda creation and nested function handling produce big structures with everything while for the assumptions it is better to have separate scalars if possible, lambda creation has various language imposed restrictions, diagnostics etc. and isn't available in C and I think the outlining in the patch is pretty simple and short. > I wonder if, instead of using the above intermediate form we > can have a new structued GIMPLE code with sub-statements > > .ASSUME > { > condition; > } That is what I wrote in the patch description as alternative: "with the condition wrapped into a GIMPLE_BIND (I admit the above isn't extra clean but it is just something to hold it from gimplifier until gimple low pass; it reassembles if (condition_never_true) { cond; }; an alternative would be introduce GOMP_ASSUME statement that would have the guard var as operand and the GIMPLE_BIND as body, but for the few passes (tree-nested and omp lowering) in between that looked like an overkill to me)" I can certainly implement that easily. > ? There's gimple_statement_omp conveniently available as base and > IIRC you had the requirement to implement some OMP assume as well? For OpenMP assumptions we right now implement just the holds clause of assume and implement it the same way as assume/gnu::assume attributes. > Of ocurse a different stmt class with body would work as well here, > maybe we can even use a gbind with a special flag. > > The outlining code can then be ajusted to outline a single BIND? It already is adjusting a single bind (of course with everything nested in it). > It probably won't simplify much that way. > > +static tree > > +create_assumption_fn (location_t loc) > > +{ > > + tree name = clone_function_name_numbered (current_function_decl, > > "_assume"); > > + /* For now, will be changed later. */ > > ? I need to create the FUNCTION_DECL early and only later on discover the used automatic vars (for which I need the destination function) and only once those are discovered I can create the right function type for the function. > > + tree type = TREE_TYPE (current_function_decl); > > + DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl) > > + = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (current_function_decl); > > + DECL_FUNCTION_SPECIFIC_TARGET (decl) > > + = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl); > > + DECL_FUNCTION_VERSIONED (decl) > > + = DECL_FUNCTION_VERSIONED (current_function_decl); > > what does it mean to copy DECL_FUNCTION_VERSIONED here? This was a copy and paste from elsewhere (I think OpenMP code). I guess I can nuke it and even better add some testcase coverage for various nasty things like assume in multi-versioned functions. > > + DECL_ARGUMENTS (lad.id.dst_fn) = parms; > > + TREE_TYPE (lad.id.dst_fn) = build_function_type (boolean_type_node, > > parmt); > > ah, here's the type. Maybe use error_mark_node as transitional type? Will see if that works. > > > + cgraph_node::add_new_function (lad.id.dst_fn, false); > > + > > + for (unsigned i = 0; i < sz; ++i) > > + { > > + tree v = lad.decls[i]; > > + if (TREE_CODE (v) == SSA_NAME) > > + release_ssa_name (v); > > + } > > + > > + stmt = gsi_stmt (*gsi); > > + lab = as_a <glabel *> (stmt); > > + gcc_assert (gimple_label_label (lab) == l1 > > + || gimple_label_label (lab) == l2); > > + gsi_remove (gsi, true); > > + stmt = gsi_stmt (*gsi); > > + gcc_assert (gimple_call_internal_p (stmt, IFN_ASSUME) > > + && gimple_call_num_args (stmt) == 1 > > + && gimple_call_arg (stmt, 0) == guard); > > + data->cannot_fallthru = false; > > + gcall *call = gimple_build_call_internal_vec (IFN_ASSUME, vargs); > > + gimple_set_location (call, loc); > > + gsi_replace (gsi, call, true); > > +} > > > > /* Lower statement GSI. DATA is passed through the recursion. We try to > > track the fallthruness of statements and get rid of unreachable return > > @@ -354,6 +738,13 @@ lower_stmt (gimple_stmt_iterator *gsi, s > > tree decl = gimple_call_fndecl (stmt); > > unsigned i; > > > > + if (gimple_call_internal_p (stmt, IFN_ASSUME) > > + && gimple_call_num_args (stmt) == 0) > > + { > > + lower_assumption (gsi, data); > > + return; > > + } > > + > > for (i = 0; i < gimple_call_num_args (stmt); i++) > > { > > tree arg = gimple_call_arg (stmt, i); > > --- gcc/tree-ssa-ccp.cc.jj 2022-10-10 11:57:40.203722414 +0200 > > +++ gcc/tree-ssa-ccp.cc 2022-10-12 19:48:28.891554716 +0200 > > @@ -4253,6 +4253,12 @@ pass_fold_builtins::execute (function *f > > } > > > > callee = gimple_call_fndecl (stmt); > > + if (!callee > > + && gimple_call_internal_p (stmt, IFN_ASSUME)) > > + { > > + gsi_remove (&i, true); > > + continue; > > + } > > if (!callee || !fndecl_built_in_p (callee, BUILT_IN_NORMAL)) > > { > > gsi_next (&i); > > --- gcc/lto-streamer-out.cc.jj 2022-10-10 11:57:40.202722428 +0200 > > +++ gcc/lto-streamer-out.cc 2022-10-12 19:48:28.891554716 +0200 > > @@ -2278,6 +2278,7 @@ output_struct_function_base (struct outp > > bp_pack_value (&bp, fn->calls_eh_return, 1); > > bp_pack_value (&bp, fn->has_force_vectorize_loops, 1); > > bp_pack_value (&bp, fn->has_simduid_loops, 1); > > + bp_pack_value (&bp, fn->assume_function, 1); > > bp_pack_value (&bp, fn->va_list_fpr_size, 8); > > bp_pack_value (&bp, fn->va_list_gpr_size, 8); > > bp_pack_value (&bp, fn->last_clique, sizeof (short) * 8); > > --- gcc/lto-streamer-in.cc.jj 2022-10-10 11:57:40.201722442 +0200 > > +++ gcc/lto-streamer-in.cc 2022-10-12 19:48:28.891554716 +0200 > > @@ -1318,6 +1318,7 @@ input_struct_function_base (struct funct > > fn->calls_eh_return = bp_unpack_value (&bp, 1); > > fn->has_force_vectorize_loops = bp_unpack_value (&bp, 1); > > fn->has_simduid_loops = bp_unpack_value (&bp, 1); > > + fn->assume_function = bp_unpack_value (&bp, 1); > > fn->va_list_fpr_size = bp_unpack_value (&bp, 8); > > fn->va_list_gpr_size = bp_unpack_value (&bp, 8); > > fn->last_clique = bp_unpack_value (&bp, sizeof (short) * 8); > > --- gcc/cgraphunit.cc.jj 2022-10-10 11:57:40.152723125 +0200 > > +++ gcc/cgraphunit.cc 2022-10-12 19:48:28.892554703 +0200 > > @@ -1882,6 +1882,16 @@ cgraph_node::expand (void) > > ggc_collect (); > > timevar_pop (TV_REST_OF_COMPILATION); > > > > + if (DECL_STRUCT_FUNCTION (decl) > > + && DECL_STRUCT_FUNCTION (decl)->assume_function) > > + { > > + /* Assume functions aren't expanded into RTL, on the other side > > + we don't want to release their body. */ > > + if (cfun) > > + pop_cfun (); > > + return; > > + } > > + > > /* Make sure that BE didn't give up on compiling. */ > > gcc_assert (TREE_ASM_WRITTEN (decl)); > > if (cfun) > > --- gcc/cfgexpand.cc.jj 2022-10-10 11:57:40.152723125 +0200 > > +++ gcc/cfgexpand.cc 2022-10-12 19:48:28.893554689 +0200 > > @@ -6597,6 +6597,14 @@ pass_expand::execute (function *fun) > > rtx_insn *var_seq, *var_ret_seq; > > unsigned i; > > > > + if (cfun->assume_function) > > + { > > + /* Assume functions should not be expanded to RTL. */ > > can we avoid getting here in the first place? I think we don't need > any of the post-pass_all_optimizations[_g] passes? I'm afraid not without revamping passes.def, because to easily cat the pass queue from certain point onwards, we need all the remaining passes to be wrapped with PUSH_INSERT_PASSES_WITHIN. So, if we e.g. wanted to cut out everything from pass_tm_init onwards, we'd need to wrap: NEXT_PASS (pass_tm_init); PUSH_INSERT_PASSES_WITHIN (pass_tm_init) NEXT_PASS (pass_tm_mark); NEXT_PASS (pass_tm_memopt); NEXT_PASS (pass_tm_edges); POP_INSERT_PASSES () NEXT_PASS (pass_simduid_cleanup); NEXT_PASS (pass_vtable_verify); NEXT_PASS (pass_lower_vaarg); NEXT_PASS (pass_lower_vector); NEXT_PASS (pass_lower_complex_O0); NEXT_PASS (pass_sancov_O0); NEXT_PASS (pass_lower_switch_O0); NEXT_PASS (pass_asan_O0); NEXT_PASS (pass_tsan_O0); NEXT_PASS (pass_sanopt); NEXT_PASS (pass_cleanup_eh); NEXT_PASS (pass_lower_resx); NEXT_PASS (pass_nrv); NEXT_PASS (pass_gimple_isel); NEXT_PASS (pass_harden_conditional_branches); NEXT_PASS (pass_harden_compares); NEXT_PASS (pass_warn_access, /*early=*/false); NEXT_PASS (pass_cleanup_cfg_post_optimizing); NEXT_PASS (pass_warn_function_noreturn); NEXT_PASS (pass_expand); in some wrapper pass with a gate (either also including pass_rest_of_compilation but that would mean undesirable reindentation of everything there, or just the above ones and have assume_function punt in the 2 or 1 gates). What I had in the patch was just skip pass_expand and pass_rest_of_compilation, perhaps another possibility to do the former would be to define a gate on pass_expand. > > --- gcc/tree-vectorizer.cc.jj 2022-10-10 11:57:40.204722400 +0200 > > +++ gcc/tree-vectorizer.cc 2022-10-12 19:48:28.894554675 +0200 > > @@ -1213,6 +1213,10 @@ public: > > /* opt_pass methods: */ > > bool gate (function *fun) final override > > { > > + /* Vectorization makes range analysis of assume functions > > + harder, not easier. */ > > + if (fun->assume_function) > > + return false; > > return flag_tree_loop_vectorize || fun->has_force_vectorize_loops; > > } > > > > @@ -1490,7 +1494,14 @@ public: > > > > /* opt_pass methods: */ > > opt_pass * clone () final override { return new pass_slp_vectorize > > (m_ctxt); } > > - bool gate (function *) final override { return flag_tree_slp_vectorize > > != 0; } > > + bool gate (function *fun) final override > > + { > > + /* Vectorization makes range analysis of assume functions harder, > > + not easier. */ > > Can we split out these kind of considerations from the initial patch? Sure. > > > + if (fun->assume_function) > > + return false; > > + return flag_tree_slp_vectorize != 0; > > + } > > unsigned int execute (function *) final override; > > > > }; // class pass_slp_vectorize > > --- gcc/ipa-icf.cc.jj 2022-10-10 11:57:40.201722442 +0200 > > +++ gcc/ipa-icf.cc 2022-10-12 19:48:28.894554675 +0200 > > @@ -1517,6 +1517,9 @@ sem_function::parse (cgraph_node *node, > > if (!func || (!node->has_gimple_body_p () && !node->thunk)) > > return NULL; > > > > + if (func->assume_function) > > + return NULL; > > + > > Do we want implicit noipa attribute on the assume functions? Or do > we need to IPA-CP into them? I suppose the ranger code can use > contextual code from the .ASSUME call for things like > assume (side-effect(), i == 1); Most of normal IPA optimizations are disabled for them because they aren't normally called, all we do is take their address and pass it to .ASSUME. IPA-ICF was an exception and I had to disable it because when it triggered it decided to create a thunk which failed to assemble. But implicit noipa surely is an option; though of course we want inlining etc. to happen into those functions. And eventually some kind of IPA SRA of their arguments but with different behavior from normal IPA-SRA. > Reading some of the patch I guessed you wanted to handle nested > assumes. So - is > > [[assume (a == 4 && ([[assume(b == 3)]], b != 2))]] > > a thing? This is not valid, assume can be just on an empty statement. But with GNU statement expressions it is a thing and I should add it to testsuite coverage. void foo (int a, int b) { [[assume (a == 4 && ({ [[assume (b == 3)]]; b != 2 }))]]; } is valid. I think the code should handle it fine (outline the outer and the new outlined function enters pass queue with all_lowering_passes and so will see pass_lower_cf again and hopefully work. Will see how it goes when I tweak the patch. Jakub