On Thu, Nov 10, 2016 at 08:12:27PM +0300, Alexander Monakov wrote: > gcc/ > * internal-fn.c (expand_GOMP_SIMT_LANE): New. > (expand_GOMP_SIMT_VF): New. > (expand_GOMP_SIMT_LAST_LANE): New. > (expand_GOMP_SIMT_ORDERED_PRED): New. > (expand_GOMP_SIMT_VOTE_ANY): New. > (expand_GOMP_SIMT_XCHG_BFLY): New. > (expand_GOMP_SIMT_XCHG_IDX): New. > * internal-fn.def (GOMP_SIMT_LANE): New. > (GOMP_SIMT_VF): New. > (GOMP_SIMT_LAST_LANE): New. > (GOMP_SIMT_ORDERED_PRED): New. > (GOMP_SIMT_VOTE_ANY): New. > (GOMP_SIMT_XCHG_BFLY): New. > (GOMP_SIMT_XCHG_IDX): New. > * omp-low.c (omp_maybe_offloaded_ctx): New, outlined from... > (create_omp_child_function): ...here. Set "omp target entrypoint" > or "omp declare target" attribute based on is_gimple_omp_offloaded. > (omp_max_simt_vf): New. Use it... > (omp_max_vf): ...here. > (lower_rec_input_clauses): Add reduction lowering for SIMT execution. > (lower_lastprivate_clauses): Likewise, for "lastprivate" lowering. > (lower_omp_ordered): Likewise, for "ordered" lowering. > (expand_omp_simd): Add SIMT transforms. > (pass_data_lower_omp): Add PROP_gimple_lomp_dev. > (execute_omp_device_lower): New. > (pass_data_omp_device_lower): New. > (pass_omp_device_lower): New pass. > (make_pass_omp_device_lower): New. > * passes.def (pass_omp_device_lower): Position new pass. > * tree-pass.h (PROP_gimple_lomp_dev): Define. > (make_pass_omp_device_lower): Declare.
Ok for trunk, once the needed corresponding config/nvptx bits are committed, with one nit below that needs immediate action and the rest can be resolved incrementally. I'd like to check in afterwards the attached patch, at least for now, so that non-offloaded SIMD code is less affected. Once you have the intended outlining of SIMT regions for PTX offloading done (IMHO the best place to do that is in omp expansion, not gimplification), you can either base it on that, or revert and do earlier. > + > +/* Return maximum SIMT width if offloading may target SIMT hardware. */ > + > +static int > +omp_max_simt_vf (void) > +{ > + if (!optimize) > + return 0; > + if (ENABLE_OFFLOADING) > + for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c; ) > + { > + if (!strncmp (c, "nvptx", strlen ("nvptx"))) > + return 32; > + else if ((c = strchr (c, ','))) > + c++; > + } > + return 0; > +} As discussed privately, this means one has to manually set OFFLOAD_TARGET_NAMES in the environment when invoking ./cc1 or ./cc1plus in order to match ./gcc -B ./ etc. behavior. I think it would be better to change the driver so that it sets OFFLOAD_TARGET_NAMES= in the environment when ENABLE_OFFLOADING, but -foffload option is used to disable all offloading and then in this function use the configured in offloading targets if ENABLE_OFFLOADING and OFFLOAD_TARGET_NAMES is not in the environment. Can be done incrementally. > + > /* Return maximum possible vectorization factor for the target. */ > > static int > @@ -4277,16 +4306,18 @@ omp_max_vf (void) > || global_options_set.x_flag_tree_vectorize))) > return 1; > > + int vf = 1; > int vs = targetm.vectorize.autovectorize_vector_sizes (); > if (vs) > + vf = 1 << floor_log2 (vs); > + else > { > - vs = 1 << floor_log2 (vs); > - return vs; > + machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode); > + if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT) > + vf = GET_MODE_NUNITS (vqimode); > } > - machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode); > - if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT) > - return GET_MODE_NUNITS (vqimode); > - return 1; > + int svf = omp_max_simt_vf (); > + return MAX (vf, svf); Increasing the vf even for host in non-offloaded regions is undesirable. Can be partly solved by the attached patch I'm planning to apply incrementally, the other part is for the simd modifier of schedule clause, there I think what we want is use conditional expression (GOMP_USE_SIMT () ? omp_max_simt_vf () : omp_max_vf). I'll try to handle the schedule clause later. > +class pass_omp_device_lower : public gimple_opt_pass > +{ > +public: > + pass_omp_device_lower (gcc::context *ctxt) > + : gimple_opt_pass (pass_data_omp_device_lower, ctxt) > + {} > + > + /* opt_pass methods: */ > + virtual bool gate (function *fun) > + { > + /* FIXME: inlining does not propagate the lomp_dev property. */ > + return 1 || !(fun->curr_properties & PROP_gimple_lomp_dev); Please change this into (ENABLE_OFFLOADING && (flag_openmp || in_lto)) for now, so that we don't waste compile time even when clearly it isn't needed, and incrementally change the inliner to propagate the property. Jakub
2016-11-11 Jakub Jelinek <ja...@redhat.com> * internal-fn.c (expand_GOMP_USE_SIMT): New function. * tree.c (omp_clause_num_ops): OMP_CLAUSE__SIMT_ has 0 operands. (omp_clause_code_name): Add _simt_ name. (walk_tree_1): Handle OMP_CLAUSE__SIMT_. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE__SIMT_. * omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE__SIMT_. (scan_omp_simd): New function. (scan_omp_1_stmt): Use it in target regions if needed. (omp_max_vf): Don't max with omp_max_simt_vf. (lower_rec_simd_input_clauses): Do it here, only if OMP_CLAUSE__SIMT_ is present. (lower_rec_input_clauses): Compute maybe_simt from presence of OMP_CLAUSE__SIMT_. (lower_lastprivate_clauses): Likewise. (expand_omp_simd): Likewise. (execute_omp_device_lower): Lower IFN_GOMP_USE_SIMT. * internal-fn.def (GOMP_USE_SIMT): New internal function. * tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__SIMT_. --- gcc/internal-fn.c.jj 2016-10-12 10:38:54.000000000 +0200 +++ gcc/internal-fn.c 2016-10-24 15:25:58.162292706 +0200 @@ -154,6 +154,12 @@ expand_ANNOTATE (internal_fn, gcall *) gcc_unreachable (); } +static void +expand_GOMP_USE_SIMT (internal_fn, gcall *) +{ + gcc_unreachable (); +} + /* Lane index on SIMT targets: thread index in the warp on NVPTX. On targets without SIMT execution this should be expanded in omp_device_lower pass. */ --- gcc/tree.c.jj 2016-10-12 10:38:56.000000000 +0200 +++ gcc/tree.c 2016-10-24 15:49:48.487890952 +0200 @@ -320,6 +320,7 @@ unsigned const char omp_clause_num_ops[] 1, /* OMP_CLAUSE_HINT */ 0, /* OMP_CLAUSE_DEFALTMAP */ 1, /* OMP_CLAUSE__SIMDUID_ */ + 0, /* OMP_CLAUSE__SIMT_ */ 1, /* OMP_CLAUSE__CILK_FOR_COUNT_ */ 0, /* OMP_CLAUSE_INDEPENDENT */ 1, /* OMP_CLAUSE_WORKER */ @@ -392,6 +393,7 @@ const char * const omp_clause_code_name[ "hint", "defaultmap", "_simduid_", + "_simt_", "_Cilk_for_count_", "independent", "worker", @@ -11671,6 +11673,7 @@ walk_tree_1 (tree *tp, walk_tree_fn func case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: case OMP_CLAUSE_TILE: + case OMP_CLAUSE__SIMT_: WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp)); case OMP_CLAUSE_LASTPRIVATE: --- gcc/tree-core.h.jj 2016-10-12 10:38:55.000000000 +0200 +++ gcc/tree-core.h 2016-10-24 15:46:48.996193955 +0200 @@ -435,6 +435,10 @@ enum omp_clause_code { /* Internally used only clause, holding SIMD uid. */ OMP_CLAUSE__SIMDUID_, + /* Internally used only clause, flag whether this is SIMT simd + loop or not. */ + OMP_CLAUSE__SIMT_, + /* Internally used only clause, holding _Cilk_for # of iterations on OMP_PARALLEL. */ OMP_CLAUSE__CILK_FOR_COUNT_, --- gcc/omp-low.c.jj 2016-10-12 10:38:54.000000000 +0200 +++ gcc/omp-low.c 2016-10-25 17:54:39.563307069 +0200 @@ -275,6 +275,7 @@ static bool omp_any_child_fn_dumped; static void scan_omp (gimple_seq *, omp_context *); static tree scan_omp_1_op (tree *, int *, void *); static gphi *find_phi_with_arg_on_edge (tree, edge); +static int omp_max_simt_vf (void); #define WALK_SUBSTMTS \ case GIMPLE_BIND: \ @@ -2188,6 +2189,7 @@ scan_sharing_clauses (tree clauses, omp_ case OMP_CLAUSE_INDEPENDENT: case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: + case OMP_CLAUSE__SIMT_: break; case OMP_CLAUSE_ALIGNED: @@ -2363,6 +2365,7 @@ scan_sharing_clauses (tree clauses, omp_ case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: case OMP_CLAUSE__GRIDDIM_: + case OMP_CLAUSE__SIMT_: break; case OMP_CLAUSE_DEVICE_RESIDENT: @@ -3067,6 +3070,48 @@ scan_omp_for (gomp_for *stmt, omp_contex scan_omp (gimple_omp_body_ptr (stmt), ctx); } +/* Duplicate #pragma omp simd, one for SIMT, another one for SIMD. */ + +static void +scan_omp_simd (gimple_stmt_iterator *gsi, gomp_for *stmt, + omp_context *outer_ctx) +{ + gbind *bind = gimple_build_bind (NULL, NULL, NULL); + gsi_replace (gsi, bind, false); + gimple_seq seq = NULL; + gimple *g = gimple_build_call_internal (IFN_GOMP_USE_SIMT, 0); + tree cond = create_tmp_var_raw (boolean_type_node); + DECL_CONTEXT (cond) = current_function_decl; + DECL_SEEN_IN_BIND_EXPR_P (cond) = 1; + gimple_bind_set_vars (bind, cond); + gimple_call_set_lhs (g, cond); + gimple_seq_add_stmt (&seq, g); + tree lab1 = create_artificial_label (UNKNOWN_LOCATION); + tree lab2 = create_artificial_label (UNKNOWN_LOCATION); + tree lab3 = create_artificial_label (UNKNOWN_LOCATION); + g = gimple_build_cond (NE_EXPR, cond, boolean_false_node, lab1, lab2); + gimple_seq_add_stmt (&seq, g); + g = gimple_build_label (lab1); + gimple_seq_add_stmt (&seq, g); + gimple_seq new_seq = copy_gimple_seq_and_replace_locals (stmt); + gomp_for *new_stmt = as_a <gomp_for *> (new_seq); + tree clause = build_omp_clause (gimple_location (stmt), OMP_CLAUSE__SIMT_); + OMP_CLAUSE_CHAIN (clause) = gimple_omp_for_clauses (new_stmt); + gimple_omp_for_set_clauses (new_stmt, clause); + gimple_seq_add_stmt (&seq, new_stmt); + g = gimple_build_goto (lab3); + gimple_seq_add_stmt (&seq, g); + g = gimple_build_label (lab2); + gimple_seq_add_stmt (&seq, g); + gimple_seq_add_stmt (&seq, stmt); + g = gimple_build_label (lab3); + gimple_seq_add_stmt (&seq, g); + gimple_bind_set_body (bind, seq); + update_stmt (bind); + scan_omp_for (new_stmt, outer_ctx); + scan_omp_for (stmt, outer_ctx); +} + /* Scan an OpenMP sections directive. */ static void @@ -3955,7 +4000,13 @@ scan_omp_1_stmt (gimple_stmt_iterator *g break; case GIMPLE_OMP_FOR: - scan_omp_for (as_a <gomp_for *> (stmt), ctx); + if (((gimple_omp_for_kind (as_a <gomp_for *> (stmt)) + & GF_OMP_FOR_KIND_MASK) == GF_OMP_FOR_KIND_SIMD) + && omp_maybe_offloaded_ctx (ctx) + && omp_max_simt_vf ()) + scan_omp_simd (gsi, as_a <gomp_for *> (stmt), ctx); + else + scan_omp_for (as_a <gomp_for *> (stmt), ctx); break; case GIMPLE_OMP_SECTIONS: @@ -4300,8 +4351,7 @@ omp_max_vf (void) if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT) vf = GET_MODE_NUNITS (vqimode); } - int svf = omp_max_simt_vf (); - return MAX (vf, svf); + return vf; } /* Helper function of lower_rec_input_clauses, used for #pragma omp simd @@ -4314,6 +4364,12 @@ lower_rec_simd_input_clauses (tree new_v if (max_vf == 0) { max_vf = omp_max_vf (); + if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt), + OMP_CLAUSE__SIMT_)) + { + int max_simt = omp_max_simt_vf (); + max_vf = MAX (max_vf, max_simt); + } if (max_vf > 1) { tree c = find_omp_clause (gimple_omp_for_clauses (ctx->stmt), @@ -4387,8 +4443,7 @@ lower_rec_input_clauses (tree clauses, g int pass; bool is_simd = (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR && gimple_omp_for_kind (ctx->stmt) & GF_OMP_FOR_SIMD); - bool maybe_simt - = is_simd && omp_maybe_offloaded_ctx (ctx) && omp_max_simt_vf () > 1; + bool maybe_simt = is_simd && find_omp_clause (clauses, OMP_CLAUSE__SIMT_); int max_vf = 0; tree lane = NULL_TREE, idx = NULL_TREE; tree simt_lane = NULL_TREE; @@ -5477,7 +5532,7 @@ lower_lastprivate_clauses (tree clauses, if (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR && gimple_omp_for_kind (ctx->stmt) & GF_OMP_FOR_SIMD) { - maybe_simt = omp_maybe_offloaded_ctx (ctx) && omp_max_simt_vf () > 1; + maybe_simt = find_omp_clause (orig_clauses, OMP_CLAUSE__SIMT_); simduid = find_omp_clause (orig_clauses, OMP_CLAUSE__SIMDUID_); if (simduid) simduid = OMP_CLAUSE__SIMDUID__DECL (simduid); @@ -10601,7 +10656,11 @@ expand_omp_simd (struct omp_region *regi bool offloaded = cgraph_node::get (current_function_decl)->offloadable; for (struct omp_region *rgn = region; !offloaded && rgn; rgn = rgn->outer) offloaded = rgn->type == GIMPLE_OMP_TARGET; - bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1; + bool is_simt + = (offloaded + && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt), + OMP_CLAUSE__SIMT_) + && safelen_int > 1); tree simt_lane = NULL_TREE, simt_maxlane = NULL_TREE; if (is_simt) { @@ -21358,6 +21417,9 @@ execute_omp_device_lower () tree type = lhs ? TREE_TYPE (lhs) : integer_type_node; switch (gimple_call_internal_fn (stmt)) { + case IFN_GOMP_USE_SIMT: + rhs = vf == 1 ? boolean_false_node : boolean_true_node; + break; case IFN_GOMP_SIMT_LANE: case IFN_GOMP_SIMT_LAST_LANE: rhs = vf == 1 ? build_zero_cst (type) : NULL_TREE; --- gcc/internal-fn.def.jj 2016-10-12 10:38:54.000000000 +0200 +++ gcc/internal-fn.def 2016-10-24 15:24:32.468380502 +0200 @@ -141,6 +141,7 @@ DEF_INTERNAL_INT_FN (FFS, ECF_CONST, ffs DEF_INTERNAL_INT_FN (PARITY, ECF_CONST, parity, unary) DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST, popcount, unary) +DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_VF, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_LAST_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) --- gcc/tree-pretty-print.c.jj 2016-10-12 10:38:55.000000000 +0200 +++ gcc/tree-pretty-print.c 2016-10-24 15:50:32.043336116 +0200 @@ -812,6 +812,10 @@ dump_omp_clause (pretty_printer *pp, tre pp_right_paren (pp); break; + case OMP_CLAUSE__SIMT_: + pp_string (pp, "_simt_"); + break; + case OMP_CLAUSE_GANG: pp_string (pp, "gang"); if (OMP_CLAUSE_GANG_EXPR (clause) != NULL_TREE) Jakub