Hi! This patch implements DOACROSS expansion (both tweaks the omp for expansion to set up everything that is needed and call new APIs, and expands ordered depend regions too). In addition to that it fixes some bugs in lower_omp_ordered_clauses, in particular the indices other than the first one (or for collapsed loops more) should be indices of the lexically latest iteration, so for forward loops and ordered(2) it is actually maximum, not minimum, and for say ordered(3) collapse(1) loops it shouldn't find maximum or minimum of each indice individually, but find one that has the outer-most dimension after collapse maximal or minimal (and if multiple sink vectors have the same outer most one, then the second etc.).
Various things are still not implemented, like loops with unsigned long and long long/unsigned long long iterators. Or apparently we can't use GCD if the first POST in the loop is not dominated by the WAITs (that will mean we probably have to move that optimization from lowering to expansion). Collapse > 1 is not handled in the optimization either. And for unsigned iterators I have various questions to be clarified in the standard. The library side is almost missing for now, all I've done is implemented the loop start APIs, so that I can at least test the expand_omp_for_generic expansion somewhat. The next week I'm going to create the needed data structures during the initialization and actually implement (perhaps only busy waiting for now) the post/wait calls. 2015-09-18 Jakub Jelinek <ja...@redhat.com> * gimplify.c (gimplify_omp_for): Push into loop_iter_var vector both the original and new decl. (gimplify_omp_ordered): Update the decl in TREE_VALUE from the original to the new decl. * omp-low.c (struct omp_region): Adjust comments, add ord_stmt field. (extract_omp_for_data): Canonicalize cond_code even for ordered loops after collapsed ones. If loops is non-NULL, fd->collapse == 1 and fd->ordered > 1, treat the outermost loop similarly to collapsed ones, n1 == 0, step == 1, n2 == constant or variable number of iterations. (check_omp_nesting_restrictions): Only check outer context when verifying ordered depend construct is closely nested in for ordered construct. (expand_omp_for_init_counts): Rename zero_iter_bb argument to zero_iter1_bb and first_zero_iter to first_zero_iter1, add zero_iter2_bb and first_zero_iter2 arguments, handle computation of counts even for ordered loops. (expand_omp_ordered_source, expand_omp_ordered_sink, expand_omp_ordered_source_sink): New functions. (expand_omp_for_ordered_loops): Add counts argument, initialize the counts vars if needed. Fix up !gsi_end_p (gsi) handling, use the right step for each loop. (expand_omp_for_generic): Handle expansion of doacross loops. (expand_omp_for_static_nochunk, expand_omp_for_static_chunk, expand_omp_simd, expand_omp_taskloop_for_outer, expand_omp_taskloop_for_inner): Adjust expand_omp_for_init_counts callers. (expand_omp_for): Handle doacross loops. (expand_omp): Don't expand ordered depend constructs here, record ord_stmt instead for later expand_omp_for_generic. (lower_omp_ordered_clauses): Don't ICE on collapsed loops, just give up on them for now. For loops other than the first or collapsed ones compute lexically latest loop rather than minimum or maximum from each constant separately. Simplify. * omp-builtins.def (BUILT_IN_GOMP_LOOP_DOACROSS_STATIC_START, BUILT_IN_GOMP_LOOP_DOACROSS_DYNAMIC_START, BUILT_IN_GOMP_LOOP_DOACROSS_GUIDED_START, BUILT_IN_GOMP_LOOP_DOACROSS_RUNTIME_START, BUILT_IN_GOMP_DOACROSS_POST, BUILT_IN_GOMP_DOACROSS_WAIT): New. * builtin-types.def (BT_FN_BOOL_UINT_LONGPTR_LONGPTR_LONGPTR, BT_FN_BOOL_UINT_LONGPTR_LONG_LONGPTR_LONGPTR, BT_FN_VOID_LONG_VAR): New. gcc/fortran/ * types.def (BT_FN_BOOL_UINT_LONGPTR_LONGPTR_LONGPTR, BT_FN_BOOL_UINT_LONGPTR_LONG_LONGPTR_LONGPTR, BT_FN_VOID_LONG_VAR): New. * f95-lang.c (DEF_FUNCTION_TYPE_VAR_1): Define. gcc/testsuite/ * c-c++-common/gomp/sink-4.c: Don't expect the constant to have pointer type. * gcc.dg/gomp/sink-fold-3.c: Likewise. * gcc.dg/gomp/sink-fold-1.c (k): New variable. (funk): Add another ordered loop, use better test values and adjust the expected result. libgomp/ * libgomp.map (GOMP_4.1): Add GOMP_loop_doacross_dynamic_start, GOMP_loop_doacross_guided_start, GOMP_loop_doacross_runtime_start, GOMP_loop_doacross_static_start, GOMP_doacross_post and GOMP_doacross_wait exports. * ordered.c: Include stdarg.h. (GOMP_doacross_post, GOMP_doacross_wait): New functions. * loop.c (gomp_loop_doacross_static_start, gomp_loop_doacross_dynamic_start, gomp_loop_doacross_guided_start, GOMP_loop_doacross_runtime_start, GOMP_loop_doacross_static_start, GOMP_loop_doacross_dynamic_start, GOMP_loop_doacross_guided_start): New functions. * libgomp_g.h (GOMP_loop_doacross_runtime_start, GOMP_loop_doacross_static_start, GOMP_loop_doacross_dynamic_start, GOMP_loop_doacross_guided_start, GOMP_doacross_post, GOMP_doacross_wait): New prototypes. --- gcc/gimplify.c.jj 2015-09-10 11:06:30.000000000 +0200 +++ gcc/gimplify.c 2015-09-18 18:11:20.285278680 +0200 @@ -7785,7 +7785,8 @@ gimplify_omp_for (tree *expr_p, gimple_s { is_doacross = true; gimplify_omp_ctxp->loop_iter_var.create (TREE_VEC_LENGTH - (OMP_FOR_INIT (for_stmt))); + (OMP_FOR_INIT (for_stmt)) + * 2); } for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)); i++) { @@ -7802,6 +7803,7 @@ gimplify_omp_for (tree *expr_p, gimple_s (TREE_VEC_ELT (OMP_FOR_ORIG_DECLS (for_stmt), i)); else gimplify_omp_ctxp->loop_iter_var.quick_push (decl); + gimplify_omp_ctxp->loop_iter_var.quick_push (decl); } /* Make sure the iteration variable is private. */ @@ -8742,19 +8744,23 @@ gimplify_omp_ordered (tree expr, gimple_ for (decls = OMP_CLAUSE_DECL (c), i = 0; decls && TREE_CODE (decls) == TREE_LIST; decls = TREE_CHAIN (decls), ++i) - if (i < gimplify_omp_ctxp->loop_iter_var.length () - && TREE_VALUE (decls) != gimplify_omp_ctxp->loop_iter_var[i]) + if (i >= gimplify_omp_ctxp->loop_iter_var.length () / 2) + continue; + else if (TREE_VALUE (decls) + != gimplify_omp_ctxp->loop_iter_var[2 * i]) { error_at (OMP_CLAUSE_LOCATION (c), "variable %qE is not an iteration " "of outermost loop %d, expected %qE", TREE_VALUE (decls), i + 1, - gimplify_omp_ctxp->loop_iter_var[i]); + gimplify_omp_ctxp->loop_iter_var[2 * i]); fail = true; failures++; } - /* Avoid being too redundant. */ - if (!fail && i != gimplify_omp_ctxp->loop_iter_var.length ()) + else + TREE_VALUE (decls) + = gimplify_omp_ctxp->loop_iter_var[2 * i + 1]; + if (!fail && i != gimplify_omp_ctxp->loop_iter_var.length () / 2) { error_at (OMP_CLAUSE_LOCATION (c), "number of variables in depend(sink) " --- gcc/omp-low.c.jj 2015-09-14 15:00:15.000000000 +0200 +++ gcc/omp-low.c 2015-09-18 18:28:42.334623281 +0200 @@ -96,7 +96,7 @@ along with GCC; see the file COPYING3. /* OMP region information. Every parallel and workshare directive is enclosed between two markers, the OMP_* directive - and a corresponding OMP_RETURN statement. */ + and a corresponding GIMPLE_OMP_RETURN statement. */ struct omp_region { @@ -112,10 +112,10 @@ struct omp_region /* Block containing the omp directive as its last stmt. */ basic_block entry; - /* Block containing the OMP_RETURN as its last stmt. */ + /* Block containing the GIMPLE_OMP_RETURN as its last stmt. */ basic_block exit; - /* Block containing the OMP_CONTINUE as its last stmt. */ + /* Block containing the GIMPLE_OMP_CONTINUE as its last stmt. */ basic_block cont; /* If this is a combined parallel+workshare region, this is a list @@ -126,11 +126,15 @@ struct omp_region /* The code for the omp directive of this region. */ enum gimple_code type; - /* Schedule kind, only used for OMP_FOR type regions. */ + /* Schedule kind, only used for GIMPLE_OMP_FOR type regions. */ enum omp_clause_schedule_kind sched_kind; /* True if this is a combined parallel+workshare region. */ bool is_combined_parallel; + + /* The ordered stmt if type is GIMPLE_OMP_ORDERED and it has + a depend clause. */ + gomp_ordered *ord_stmt; }; /* Levels of parallelism as defined by OpenACC. Increasing numbers @@ -475,6 +479,7 @@ extract_omp_for_data (gomp_for *for_stmt == GF_OMP_FOR_KIND_DISTRIBUTE; bool taskloop = gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_TASKLOOP; + tree iterv, countv; fd->for_stmt = for_stmt; fd->pre = NULL; @@ -527,6 +532,14 @@ extract_omp_for_data (gomp_for *for_stmt default: break; } + if (fd->ordered && fd->collapse == 1 && loops != NULL) + { + fd->loops = loops; + iterv = NULL_TREE; + countv = NULL_TREE; + collapse_iter = &iterv; + collapse_count = &countv; + } /* FIXME: for now map schedule(auto) to schedule(static). There should be analysis to determine whether all iterations @@ -555,7 +568,7 @@ extract_omp_for_data (gomp_for *for_stmt int cnt = fd->collapse + (fd->ordered > 0 ? fd->ordered - 1 : 0); for (i = 0; i < cnt; i++) { - if (i == 0 && fd->collapse == 1) + if (i == 0 && fd->collapse == 1 && (fd->ordered == 0 || loops == NULL)) loop = &fd->loop; else if (loops != NULL) loop = loops + i; @@ -583,8 +596,6 @@ extract_omp_for_data (gomp_for *for_stmt == GF_OMP_FOR_KIND_CILKFOR)); break; case LE_EXPR: - if (i >= fd->collapse) - break; if (POINTER_TYPE_P (TREE_TYPE (loop->n2))) loop->n2 = fold_build_pointer_plus_hwi_loc (loc, loop->n2, 1); else @@ -594,8 +605,6 @@ extract_omp_for_data (gomp_for *for_stmt loop->cond_code = LT_EXPR; break; case GE_EXPR: - if (i >= fd->collapse) - break; if (POINTER_TYPE_P (TREE_TYPE (loop->n2))) loop->n2 = fold_build_pointer_plus_hwi_loc (loc, loop->n2, -1); else @@ -763,7 +772,7 @@ extract_omp_for_data (gomp_for *for_stmt *collapse_count = create_tmp_var (iter_type, ".count"); } - if (fd->collapse > 1) + if (fd->collapse > 1 || (fd->ordered && loops)) { fd->loop.v = *collapse_iter; fd->loop.n1 = build_int_cst (TREE_TYPE (fd->loop.v), 0); @@ -3362,20 +3371,14 @@ check_omp_nesting_restrictions (gimple s if (kind == OMP_CLAUSE_DEPEND_SOURCE || kind == OMP_CLAUSE_DEPEND_SINK) { - bool have_ordered = false; tree oclause; /* Look for containing ordered(N) loop. */ - for (omp_context *octx = ctx; octx; octx = octx->outer) - if (gimple_code (octx->stmt) == GIMPLE_OMP_FOR - && (oclause = find_omp_clause - (gimple_omp_for_clauses (octx->stmt), - OMP_CLAUSE_ORDERED)) - && OMP_CLAUSE_ORDERED_EXPR (oclause) != NULL_TREE) - { - have_ordered = true; - break; - } - if (!have_ordered) + if (ctx == NULL + || gimple_code (ctx->stmt) != GIMPLE_OMP_FOR + || (oclause + = find_omp_clause (gimple_omp_for_clauses (ctx->stmt), + OMP_CLAUSE_ORDERED)) == NULL_TREE + || OMP_CLAUSE_ORDERED_EXPR (oclause) == NULL_TREE) { error_at (OMP_CLAUSE_LOCATION (c), "%<depend%> clause must be closely nested " @@ -6724,7 +6727,8 @@ expand_omp_taskreg (struct omp_region *r static void expand_omp_for_init_counts (struct omp_for_data *fd, gimple_stmt_iterator *gsi, basic_block &entry_bb, tree *counts, - basic_block &zero_iter_bb, int &first_zero_iter, + basic_block &zero_iter1_bb, int &first_zero_iter1, + basic_block &zero_iter2_bb, int &first_zero_iter2, basic_block &l2_dom_bb) { tree t, type = TREE_TYPE (fd->loop.v); @@ -6737,6 +6741,7 @@ expand_omp_for_init_counts (struct omp_f if (gimple_omp_for_combined_into_p (fd->for_stmt) && TREE_CODE (fd->loop.n2) != INTEGER_CST) { + gcc_assert (fd->ordered == 0); /* First two _looptemp_ clauses are for istart/iend, counts[0] isn't supposed to be handled, as the inner loop doesn't use it. */ @@ -6756,11 +6761,27 @@ expand_omp_for_init_counts (struct omp_f return; } - for (i = 0; i < fd->collapse; i++) + for (i = fd->collapse; i < fd->collapse + fd->ordered - 1; i++) + { + tree itype = TREE_TYPE (fd->loops[i].v); + counts[i] = NULL_TREE; + t = fold_binary (fd->loops[i].cond_code, boolean_type_node, + fold_convert (itype, fd->loops[i].n1), + fold_convert (itype, fd->loops[i].n2)); + if (t && integer_zerop (t)) + { + for (i = fd->collapse; i < fd->collapse + fd->ordered - 1; i++) + counts[i] = build_int_cst (type, 0); + break; + } + } + for (i = 0; i < fd->collapse + (fd->ordered ? fd->ordered - 1 : 0); i++) { tree itype = TREE_TYPE (fd->loops[i].v); - if (SSA_VAR_P (fd->loop.n2) + if (i >= fd->collapse && counts[i]) + continue; + if ((SSA_VAR_P (fd->loop.n2) || i >= fd->collapse) && ((t = fold_binary (fd->loops[i].cond_code, boolean_type_node, fold_convert (itype, fd->loops[i].n1), fold_convert (itype, fd->loops[i].n2))) @@ -6786,6 +6807,10 @@ expand_omp_for_init_counts (struct omp_f gimple_regimplify_operands (cond_stmt, gsi); } e = split_block (entry_bb, cond_stmt); + basic_block &zero_iter_bb + = i < fd->collapse ? zero_iter1_bb : zero_iter2_bb; + int &first_zero_iter + = i < fd->collapse ? first_zero_iter1 : first_zero_iter2; if (zero_iter_bb == NULL) { gassign *assign_stmt; @@ -6793,8 +6818,15 @@ expand_omp_for_init_counts (struct omp_f zero_iter_bb = create_empty_bb (entry_bb); add_bb_to_loop (zero_iter_bb, entry_bb->loop_father); *gsi = gsi_after_labels (zero_iter_bb); - assign_stmt = gimple_build_assign (fd->loop.n2, - build_zero_cst (type)); + if (i < fd->collapse) + assign_stmt = gimple_build_assign (fd->loop.n2, + build_zero_cst (type)); + else + { + counts[i] = create_tmp_reg (type, ".count"); + assign_stmt + = gimple_build_assign (counts[i], build_zero_cst (type)); + } gsi_insert_before (gsi, assign_stmt, GSI_SAME_STMT); set_immediate_dominator (CDI_DOMINATORS, zero_iter_bb, entry_bb); @@ -6838,10 +6870,11 @@ expand_omp_for_init_counts (struct omp_f counts[i] = t; else { - counts[i] = create_tmp_reg (type, ".count"); + if (i < fd->collapse || i != first_zero_iter2) + counts[i] = create_tmp_reg (type, ".count"); expand_omp_build_assign (gsi, counts[i], t); } - if (SSA_VAR_P (fd->loop.n2)) + if (SSA_VAR_P (fd->loop.n2) && i < fd->collapse) { if (i == 0) t = counts[0]; @@ -7032,11 +7065,244 @@ extract_omp_for_update_vars (struct omp_ } +/* Expand #pragma omp ordered depend(source). */ + +static void +expand_omp_ordered_source (gimple_stmt_iterator *gsi, struct omp_for_data *fd, + tree *counts, location_t loc) +{ + auto_vec<tree, 10> args; + enum built_in_function source_ix = BUILT_IN_GOMP_DOACROSS_POST; + tree t; + int i; + + for (i = fd->collapse - 1; i < fd->collapse + fd->ordered - 1; i++) + if (i == fd->collapse - 1 && fd->collapse > 1) + args.quick_push (fd->loop.v); + else if (counts[i]) + args.safe_push (counts[i]); + else + { + t = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (fd->loops[i].v), + fd->loops[i].v, fd->loops[i].n1); + t = fold_convert_loc (loc, fd->iter_type, t); + t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, + true, GSI_SAME_STMT); + args.safe_push (t); + } + gimple g = gimple_build_call_vec (builtin_decl_explicit (source_ix), args); + gimple_set_location (g, loc); + gsi_insert_before (gsi, g, GSI_SAME_STMT); +} + +/* Expand a single depend from #pragma omp ordered depend(sink:...). */ + +static void +expand_omp_ordered_sink (gimple_stmt_iterator *gsi, struct omp_for_data *fd, + tree *counts, tree c, location_t loc) +{ + auto_vec<tree, 10> args; + enum built_in_function sink_ix = BUILT_IN_GOMP_DOACROSS_WAIT; + tree t, off, coff = NULL_TREE, deps = OMP_CLAUSE_DECL (c), cond = NULL_TREE; + int i; + gimple_stmt_iterator gsi2 = *gsi; + + gsi_prev (&gsi2); + edge e1 = split_block (gsi_bb (gsi2), gsi_stmt (gsi2)); + edge e2 = split_block_after_labels (e1->dest); + + *gsi = gsi_after_labels (e1->dest); + for (i = 0; i < fd->collapse + fd->ordered - 1; i++) + { + tree itype = TREE_TYPE (fd->loops[i].v); + if (POINTER_TYPE_P (itype)) + itype = sizetype; + if (i) + deps = TREE_CHAIN (deps); + off = TREE_PURPOSE (deps); + tree s = fold_convert_loc (loc, itype, fd->loops[i].step); + + if (integer_zerop (off)) + t = boolean_true_node; + else + { + tree a; + tree co = fold_convert_loc (loc, itype, off); + if (POINTER_TYPE_P (TREE_TYPE (fd->loops[i].v))) + a = fold_build2_loc (loc, POINTER_PLUS_EXPR, + TREE_TYPE (fd->loops[i].v), fd->loops[i].v, + co); + else + a = fold_build2_loc (loc, PLUS_EXPR, TREE_TYPE (fd->loops[i].v), + fd->loops[i].v, co); + if (!TYPE_UNSIGNED (itype) + || POINTER_TYPE_P (TREE_TYPE (fd->loops[i].v))) + { + if (fd->loops[i].cond_code == LT_EXPR) + { + if (wi::neg_p (co)) + t = fold_build2_loc (loc, GE_EXPR, boolean_type_node, a, + fd->loops[i].n1); + else + t = fold_build2_loc (loc, LT_EXPR, boolean_type_node, a, + fd->loops[i].n2); + } + else if (wi::neg_p (co)) + t = fold_build2_loc (loc, GT_EXPR, boolean_type_node, a, + fd->loops[i].n2); + else + t = fold_build2_loc (loc, LE_EXPR, boolean_type_node, a, + fd->loops[i].n1); + } + else if (fd->loops[i].cond_code == LT_EXPR) + { + a = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (fd->loops[i].v), + a, fd->loops[i].n1); + t = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (fd->loops[i].v), + fd->loops[i].n2, fd->loops[i].n1); + t = fold_build2_loc (loc, LT_EXPR, boolean_type_node, a, t); + } + else + { + a = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (fd->loops[i].v), + a, fd->loops[i].n2); + a = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (fd->loops[i].v), + a, + build_int_cst (TREE_TYPE (fd->loops[i].v), + 1)); + t = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (fd->loops[i].v), + fd->loops[i].n1, fd->loops[i].n2); + t = fold_build2_loc (loc, LT_EXPR, boolean_type_node, a, t); + } + } + if (cond) + cond = fold_build2_loc (loc, BIT_AND_EXPR, boolean_type_node, cond, t); + else + cond = t; + + off = fold_convert_loc (loc, itype, off); + + if (fd->loops[i].cond_code == LT_EXPR + ? !integer_onep (fd->loops[i].step) + : !integer_minus_onep (fd->loops[i].step)) + { + if (TYPE_UNSIGNED (itype) && fd->loops[i].cond_code == GT_EXPR) + t = fold_build2_loc (loc, TRUNC_MOD_EXPR, itype, + fold_build1_loc (loc, NEGATE_EXPR, itype, + off), + fold_build1_loc (loc, NEGATE_EXPR, itype, + s)); + else + t = fold_build2_loc (loc, TRUNC_MOD_EXPR, itype, off, s); + t = fold_build2_loc (loc, EQ_EXPR, boolean_type_node, t, + build_int_cst (itype, 0)); + cond = fold_build2_loc (loc, BIT_AND_EXPR, boolean_type_node, + cond, t); + } + + if (i <= fd->collapse - 1 && fd->collapse > 1) + t = fd->loop.v; + else if (counts[i]) + t = counts[i]; + else + { + t = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (fd->loops[i].v), + fd->loops[i].v, fd->loops[i].n1); + t = fold_convert_loc (loc, fd->iter_type, t); + } + if (TYPE_UNSIGNED (itype) && fd->loops[i].cond_code == GT_EXPR) + off = fold_build2_loc (loc, TRUNC_DIV_EXPR, itype, + fold_build1_loc (loc, NEGATE_EXPR, itype, + off), + fold_build1_loc (loc, NEGATE_EXPR, itype, + s)); + else + off = fold_build2_loc (loc, TRUNC_DIV_EXPR, itype, off, s); + off = fold_convert_loc (loc, fd->iter_type, off); + if (i <= fd->collapse - 1 && fd->collapse > 1) + { + if (i) + off = fold_build2_loc (loc, PLUS_EXPR, fd->iter_type, coff, + off); + if (i < fd->collapse - 1) + { + coff = fold_build2_loc (loc, MULT_EXPR, fd->iter_type, off, + counts[i]); + continue; + } + } + off = unshare_expr (off); + t = fold_build2_loc (loc, PLUS_EXPR, fd->iter_type, t, off); + t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, + true, GSI_SAME_STMT); + args.safe_push (t); + } + gimple g = gimple_build_call_vec (builtin_decl_explicit (sink_ix), args); + gimple_set_location (g, loc); + gsi_insert_before (gsi, g, GSI_SAME_STMT); + + *gsi = gsi_last_bb (e1->src); + cond = unshare_expr (cond); + cond = force_gimple_operand_gsi (gsi, cond, true, NULL_TREE, false, + GSI_CONTINUE_LINKING); + gsi_insert_after (gsi, gimple_build_cond_empty (cond), GSI_NEW_STMT); + edge e3 = make_edge (e1->src, e2->dest, EDGE_FALSE_VALUE); + e3->probability = REG_BR_PROB_BASE / 8; + e1->probability = REG_BR_PROB_BASE - e3->probability; + e1->flags = EDGE_TRUE_VALUE; + set_immediate_dominator (CDI_DOMINATORS, e2->dest, e1->src); + + *gsi = gsi_after_labels (e2->dest); +} + +/* Expand all #pragma omp ordered depend(source) and + #pragma omp ordered depend(sink:...) constructs in the current + #pragma omp for ordered(n) region. */ + +static void +expand_omp_ordered_source_sink (struct omp_region *region, + struct omp_for_data *fd, tree *counts, + basic_block cont_bb) +{ + struct omp_region *inner; + int i; + for (i = fd->collapse - 1; i < fd->collapse + fd->ordered - 1; i++) + if (i == fd->collapse - 1 && fd->collapse > 1) + counts[i] = NULL_TREE; + else if (i >= fd->collapse && !cont_bb) + counts[i] = build_zero_cst (fd->iter_type); + else if (!POINTER_TYPE_P (TREE_TYPE (fd->loops[i].v)) + && integer_onep (fd->loops[i].step)) + counts[i] = NULL_TREE; + else + counts[i] = create_tmp_var (fd->iter_type, ".orditer"); + + for (inner = region->inner; inner; inner = inner->next) + if (inner->type == GIMPLE_OMP_ORDERED) + { + gomp_ordered *ord_stmt = inner->ord_stmt; + gimple_stmt_iterator gsi = gsi_for_stmt (ord_stmt); + location_t loc = gimple_location (ord_stmt); + tree c; + for (c = gimple_omp_ordered_clauses (ord_stmt); + c; c = OMP_CLAUSE_CHAIN (c)) + if (OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SOURCE) + break; + if (c) + expand_omp_ordered_source (&gsi, fd, counts, loc); + for (c = gimple_omp_ordered_clauses (ord_stmt); + c; c = OMP_CLAUSE_CHAIN (c)) + if (OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SINK) + expand_omp_ordered_sink (&gsi, fd, counts, c, loc); + gsi_remove (&gsi, true); + } +} + /* Wrap the body into fd->ordered - 1 loops that aren't collapsed. */ static basic_block -expand_omp_for_ordered_loops (struct omp_for_data *fd, basic_block cont_bb, - basic_block body_bb) +expand_omp_for_ordered_loops (struct omp_for_data *fd, tree *counts, + basic_block cont_bb, basic_block body_bb) { if (fd->ordered <= 1) return cont_bb; @@ -7059,10 +7325,13 @@ expand_omp_for_ordered_loops (struct omp gimple_stmt_iterator gsi = gsi_after_labels (body_bb); expand_omp_build_assign (&gsi, fd->loops[i].v, fold_convert (type, fd->loops[i].n1)); + if (counts[i]) + expand_omp_build_assign (&gsi, counts[i], + build_zero_cst (fd->iter_type)); if (!gsi_end_p (gsi)) gsi_prev (&gsi); else - gsi_last_bb (body_bb); + gsi = gsi_last_bb (body_bb); edge e1 = split_block (body_bb, gsi_stmt (gsi)); basic_block new_body = e1->dest; if (body_bb == cont_bb) @@ -7070,11 +7339,18 @@ expand_omp_for_ordered_loops (struct omp gsi = gsi_last_bb (cont_bb); if (POINTER_TYPE_P (type)) t = fold_build_pointer_plus (fd->loops[i].v, - fold_convert (sizetype, fd->loop.step)); + fold_convert (sizetype, + fd->loops[i].step)); else t = fold_build2 (PLUS_EXPR, type, fd->loops[i].v, - fold_convert (type, fd->loop.step)); + fold_convert (type, fd->loops[i].step)); expand_omp_build_assign (&gsi, fd->loops[i].v, t); + if (counts[i]) + { + t = fold_build2 (PLUS_EXPR, fd->iter_type, counts[i], + build_int_cst (fd->iter_type, 1)); + expand_omp_build_assign (&gsi, counts[i], t); + } gsi_prev (&gsi); edge e2 = split_block (cont_bb, gsi_stmt (gsi)); basic_block new_header = e2->dest; @@ -7221,35 +7497,6 @@ expand_omp_for_generic (struct omp_regio gcc_assert (fd->iter_type == long_integer_type_node || !in_combined_parallel); - type = TREE_TYPE (fd->loop.v); - istart0 = create_tmp_var (fd->iter_type, ".istart0"); - iend0 = create_tmp_var (fd->iter_type, ".iend0"); - TREE_ADDRESSABLE (istart0) = 1; - TREE_ADDRESSABLE (iend0) = 1; - - /* See if we need to bias by LLONG_MIN. */ - if (fd->iter_type == long_long_unsigned_type_node - && TREE_CODE (type) == INTEGER_TYPE - && !TYPE_UNSIGNED (type)) - { - tree n1, n2; - - if (fd->loop.cond_code == LT_EXPR) - { - n1 = fd->loop.n1; - n2 = fold_build2 (PLUS_EXPR, type, fd->loop.n2, fd->loop.step); - } - else - { - n1 = fold_build2 (MINUS_EXPR, type, fd->loop.n2, fd->loop.step); - n2 = fd->loop.n1; - } - if (TREE_CODE (n1) != INTEGER_CST - || TREE_CODE (n2) != INTEGER_CST - || ((tree_int_cst_sgn (n1) < 0) ^ (tree_int_cst_sgn (n2) < 0))) - bias = fold_convert (fd->iter_type, TYPE_MIN_VALUE (type)); - } - entry_bb = region->entry; cont_bb = region->cont; collapse_bb = NULL; @@ -7272,39 +7519,101 @@ expand_omp_for_generic (struct omp_regio gsi = gsi_last_bb (entry_bb); gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_FOR); - if (fd->collapse > 1) + if (fd->collapse > 1 || fd->ordered) { - int first_zero_iter = -1; - basic_block zero_iter_bb = NULL, l2_dom_bb = NULL; + int first_zero_iter1 = -1, first_zero_iter2 = -1; + basic_block zero_iter1_bb = NULL, zero_iter2_bb = NULL, l2_dom_bb = NULL; - counts = XALLOCAVEC (tree, fd->collapse); + counts = XALLOCAVEC (tree, fd->collapse + + (fd->ordered ? fd->ordered - 1 : 0)); expand_omp_for_init_counts (fd, &gsi, entry_bb, counts, - zero_iter_bb, first_zero_iter, - l2_dom_bb); + zero_iter1_bb, first_zero_iter1, + zero_iter2_bb, first_zero_iter2, l2_dom_bb); - if (zero_iter_bb) + if (zero_iter1_bb) { /* Some counts[i] vars might be uninitialized if some loop has zero iterations. But the body shouldn't be executed in that case, so just avoid uninit warnings. */ - for (i = first_zero_iter; i < fd->collapse; i++) + for (i = first_zero_iter1; + i < fd->collapse + (fd->ordered ? fd->ordered - 1 : 0); i++) if (SSA_VAR_P (counts[i])) TREE_NO_WARNING (counts[i]) = 1; gsi_prev (&gsi); e = split_block (entry_bb, gsi_stmt (gsi)); entry_bb = e->dest; - make_edge (zero_iter_bb, entry_bb, EDGE_FALLTHRU); + make_edge (zero_iter1_bb, entry_bb, EDGE_FALLTHRU); gsi = gsi_last_bb (entry_bb); set_immediate_dominator (CDI_DOMINATORS, entry_bb, get_immediate_dominator (CDI_DOMINATORS, - zero_iter_bb)); + zero_iter1_bb)); + } + if (zero_iter2_bb) + { + /* Some counts[i] vars might be uninitialized if + some loop has zero iterations. But the body shouldn't + be executed in that case, so just avoid uninit warnings. */ + for (i = first_zero_iter2; i < fd->collapse + fd->ordered - 1; i++) + if (SSA_VAR_P (counts[i])) + TREE_NO_WARNING (counts[i]) = 1; + if (zero_iter1_bb) + make_edge (zero_iter2_bb, entry_bb, EDGE_FALLTHRU); + else + { + gsi_prev (&gsi); + e = split_block (entry_bb, gsi_stmt (gsi)); + entry_bb = e->dest; + make_edge (zero_iter1_bb, entry_bb, EDGE_FALLTHRU); + gsi = gsi_last_bb (entry_bb); + set_immediate_dominator (CDI_DOMINATORS, entry_bb, + get_immediate_dominator + (CDI_DOMINATORS, zero_iter1_bb)); + } + } + if (fd->collapse == 1) + { + counts[0] = fd->loop.n2; + fd->loop = fd->loops[0]; } } + + type = TREE_TYPE (fd->loop.v); + istart0 = create_tmp_var (fd->iter_type, ".istart0"); + iend0 = create_tmp_var (fd->iter_type, ".iend0"); + TREE_ADDRESSABLE (istart0) = 1; + TREE_ADDRESSABLE (iend0) = 1; + + /* See if we need to bias by LLONG_MIN. */ + if (fd->iter_type == long_long_unsigned_type_node + && TREE_CODE (type) == INTEGER_TYPE + && !TYPE_UNSIGNED (type) + && fd->ordered == 0) + { + tree n1, n2; + + if (fd->loop.cond_code == LT_EXPR) + { + n1 = fd->loop.n1; + n2 = fold_build2 (PLUS_EXPR, type, fd->loop.n2, fd->loop.step); + } + else + { + n1 = fold_build2 (MINUS_EXPR, type, fd->loop.n2, fd->loop.step); + n2 = fd->loop.n1; + } + if (TREE_CODE (n1) != INTEGER_CST + || TREE_CODE (n2) != INTEGER_CST + || ((tree_int_cst_sgn (n1) < 0) ^ (tree_int_cst_sgn (n2) < 0))) + bias = fold_convert (fd->iter_type, TYPE_MIN_VALUE (type)); + } + gimple_stmt_iterator gsif = gsi; gsi_prev (&gsif); + tree arr = NULL_TREE; if (in_combined_parallel) { + gcc_assert (fd->ordered == 0); /* In a combined parallel loop, emit a call to GOMP_loop_foo_next. */ t = build_call_expr (builtin_decl_explicit (next_fn), 2, @@ -7318,38 +7627,76 @@ expand_omp_for_generic (struct omp_regio GOMP_loop_foo_start in ENTRY_BB. */ t4 = build_fold_addr_expr (iend0); t3 = build_fold_addr_expr (istart0); - t2 = fold_convert (fd->iter_type, fd->loop.step); - t1 = fd->loop.n2; - t0 = fd->loop.n1; - if (gimple_omp_for_combined_into_p (fd->for_stmt)) + if (fd->ordered) { - tree innerc = find_omp_clause (gimple_omp_for_clauses (fd->for_stmt), - OMP_CLAUSE__LOOPTEMP_); - gcc_assert (innerc); - t0 = OMP_CLAUSE_DECL (innerc); - innerc = find_omp_clause (OMP_CLAUSE_CHAIN (innerc), - OMP_CLAUSE__LOOPTEMP_); - gcc_assert (innerc); - t1 = OMP_CLAUSE_DECL (innerc); - } - if (POINTER_TYPE_P (TREE_TYPE (t0)) - && TYPE_PRECISION (TREE_TYPE (t0)) - != TYPE_PRECISION (fd->iter_type)) - { - /* Avoid casting pointers to integer of a different size. */ - tree itype = signed_type_for (type); - t1 = fold_convert (fd->iter_type, fold_convert (itype, t1)); - t0 = fold_convert (fd->iter_type, fold_convert (itype, t0)); + t0 = build_int_cst (unsigned_type_node, fd->ordered); + arr = create_tmp_var (build_array_type_nelts (fd->iter_type, + fd->ordered), + ".omp_counts"); + DECL_NAMELESS (arr) = 1; + TREE_ADDRESSABLE (arr) = 1; + TREE_STATIC (arr) = 1; + vec<constructor_elt, va_gc> *v; + vec_alloc (v, fd->ordered); + int idx; + + for (idx = 0; idx < fd->ordered; idx++) + { + tree c; + if (idx == 0 && fd->collapse > 1) + c = fd->loop.n2; + else + c = counts[idx + fd->collapse - 1]; + tree purpose = size_int (idx); + CONSTRUCTOR_APPEND_ELT (v, purpose, c); + if (TREE_CODE (c) != INTEGER_CST) + TREE_STATIC (arr) = 0; + } + + DECL_INITIAL (arr) = build_constructor (TREE_TYPE (arr), v); + if (!TREE_STATIC (arr)) + force_gimple_operand_gsi (&gsi, build1 (DECL_EXPR, + void_type_node, arr), + true, NULL_TREE, true, GSI_SAME_STMT); + t1 = build_fold_addr_expr (arr); + t2 = NULL_TREE; } else { - t1 = fold_convert (fd->iter_type, t1); - t0 = fold_convert (fd->iter_type, t0); - } - if (bias) - { - t1 = fold_build2 (PLUS_EXPR, fd->iter_type, t1, bias); - t0 = fold_build2 (PLUS_EXPR, fd->iter_type, t0, bias); + t2 = fold_convert (fd->iter_type, fd->loop.step); + t1 = fd->loop.n2; + t0 = fd->loop.n1; + if (gimple_omp_for_combined_into_p (fd->for_stmt)) + { + tree innerc + = find_omp_clause (gimple_omp_for_clauses (fd->for_stmt), + OMP_CLAUSE__LOOPTEMP_); + gcc_assert (innerc); + t0 = OMP_CLAUSE_DECL (innerc); + innerc = find_omp_clause (OMP_CLAUSE_CHAIN (innerc), + OMP_CLAUSE__LOOPTEMP_); + gcc_assert (innerc); + t1 = OMP_CLAUSE_DECL (innerc); + } + if (POINTER_TYPE_P (TREE_TYPE (t0)) + && TYPE_PRECISION (TREE_TYPE (t0)) + != TYPE_PRECISION (fd->iter_type)) + { + /* Avoid casting pointers to integer of a different size. */ + tree itype = signed_type_for (type); + t1 = fold_convert (fd->iter_type, fold_convert (itype, t1)); + t0 = fold_convert (fd->iter_type, fold_convert (itype, t0)); + } + else + { + t1 = fold_convert (fd->iter_type, t1); + t0 = fold_convert (fd->iter_type, t0); + } + if (bias) + { + t1 = fold_build2 (PLUS_EXPR, fd->iter_type, t1, bias); + t0 = fold_build2 (PLUS_EXPR, fd->iter_type, t0, bias); + } } if (fd->iter_type == long_integer_type_node) { @@ -7357,9 +7704,16 @@ expand_omp_for_generic (struct omp_regio { t = fold_convert (fd->iter_type, fd->chunk_size); t = omp_adjust_chunk_size (t, fd->simd_schedule); - t = build_call_expr (builtin_decl_explicit (start_fn), - 6, t0, t1, t2, t, t3, t4); + if (fd->ordered) + t = build_call_expr (builtin_decl_explicit (start_fn), + 5, t0, t1, t, t3, t4); + else + t = build_call_expr (builtin_decl_explicit (start_fn), + 6, t0, t1, t2, t, t3, t4); } + else if (fd->ordered) + t = build_call_expr (builtin_decl_explicit (start_fn), + 4, t0, t1, t3, t4); else t = build_call_expr (builtin_decl_explicit (start_fn), 5, t0, t1, t2, t3, t4); @@ -7383,8 +7737,14 @@ expand_omp_for_generic (struct omp_regio tree bfn_decl = builtin_decl_explicit (start_fn); t = fold_convert (fd->iter_type, fd->chunk_size); t = omp_adjust_chunk_size (t, fd->simd_schedule); - t = build_call_expr (bfn_decl, 7, t5, t0, t1, t2, t, t3, t4); + if (fd->ordered) + t = build_call_expr (bfn_decl, 6, t5, t0, t1, t, t3, t4); + else + t = build_call_expr (bfn_decl, 7, t5, t0, t1, t2, t, t3, t4); } + else if (fd->ordered) + t = build_call_expr (builtin_decl_explicit (start_fn), + 5, t5, t0, t1, t3, t4); else t = build_call_expr (builtin_decl_explicit (start_fn), 6, t5, t0, t1, t2, t3, t4); @@ -7395,6 +7755,13 @@ expand_omp_for_generic (struct omp_regio t, build_int_cst (TREE_TYPE (t), 0)); t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, GSI_SAME_STMT); + if (arr && !TREE_STATIC (arr)) + { + tree clobber = build_constructor (TREE_TYPE (arr), NULL); + TREE_THIS_VOLATILE (clobber) = 1; + gsi_insert_before (&gsi, gimple_build_assign (arr, clobber), + GSI_SAME_STMT); + } gsi_insert_after (&gsi, gimple_build_cond_empty (t), GSI_SAME_STMT); /* Remove the GIMPLE_OMP_FOR statement. */ @@ -7425,11 +7792,29 @@ expand_omp_for_generic (struct omp_regio gsi = gsi_start_bb (l0_bb); t = istart0; - if (bias) + if (fd->ordered && fd->collapse == 1) + t = fold_build2 (MULT_EXPR, fd->iter_type, t, + fold_convert (fd->iter_type, fd->loop.step)); + else if (bias) t = fold_build2 (MINUS_EXPR, fd->iter_type, t, bias); - if (POINTER_TYPE_P (TREE_TYPE (startvar))) - t = fold_convert (signed_type_for (TREE_TYPE (startvar)), t); - t = fold_convert (TREE_TYPE (startvar), t); + if (fd->ordered && fd->collapse == 1) + { + if (POINTER_TYPE_P (TREE_TYPE (startvar))) + t = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (startvar), + fd->loop.n1, fold_convert (sizetype, t)); + else + { + t = fold_convert (TREE_TYPE (startvar), t); + t = fold_build2 (PLUS_EXPR, TREE_TYPE (startvar), + fd->loop.n1, t); + } + } + else + { + if (POINTER_TYPE_P (TREE_TYPE (startvar))) + t = fold_convert (signed_type_for (TREE_TYPE (startvar)), t); + t = fold_convert (TREE_TYPE (startvar), t); + } t = force_gimple_operand_gsi (&gsi, t, DECL_P (startvar) && TREE_ADDRESSABLE (startvar), @@ -7438,11 +7823,29 @@ expand_omp_for_generic (struct omp_regio gsi_insert_after (&gsi, assign_stmt, GSI_CONTINUE_LINKING); t = iend0; - if (bias) + if (fd->ordered && fd->collapse == 1) + t = fold_build2 (MULT_EXPR, fd->iter_type, t, + fold_convert (fd->iter_type, fd->loop.step)); + else if (bias) t = fold_build2 (MINUS_EXPR, fd->iter_type, t, bias); - if (POINTER_TYPE_P (TREE_TYPE (startvar))) - t = fold_convert (signed_type_for (TREE_TYPE (startvar)), t); - t = fold_convert (TREE_TYPE (startvar), t); + if (fd->ordered && fd->collapse == 1) + { + if (POINTER_TYPE_P (TREE_TYPE (startvar))) + t = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (startvar), + fd->loop.n1, fold_convert (sizetype, t)); + else + { + t = fold_convert (TREE_TYPE (startvar), t); + t = fold_build2 (PLUS_EXPR, TREE_TYPE (startvar), + fd->loop.n1, t); + } + } + else + { + if (POINTER_TYPE_P (TREE_TYPE (startvar))) + t = fold_convert (signed_type_for (TREE_TYPE (startvar)), t); + t = fold_convert (TREE_TYPE (startvar), t); + } iend = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, false, GSI_CONTINUE_LINKING); if (endvar) @@ -7506,7 +7909,19 @@ expand_omp_for_generic (struct omp_regio if (fd->collapse > 1) expand_omp_for_init_vars (fd, &gsi, counts, inner_stmt, startvar); - cont_bb = expand_omp_for_ordered_loops (fd, cont_bb, l1_bb); + if (fd->ordered) + expand_omp_ordered_source_sink (region, fd, counts, cont_bb); + cont_bb = expand_omp_for_ordered_loops (fd, counts, cont_bb, l1_bb); + if (fd->ordered && counts[fd->collapse - 1]) + { + gcc_assert (fd->collapse == 1); + gsi = gsi_last_bb (l0_bb); + expand_omp_build_assign (&gsi, counts[fd->collapse - 1], istart0, true); + gsi = gsi_last_bb (cont_bb); + t = fold_build2 (PLUS_EXPR, fd->iter_type, counts[fd->collapse - 1], + build_int_cst (fd->iter_type, 1)); + expand_omp_build_assign (&gsi, counts[fd->collapse - 1], t); + } if (!broken_loop) { @@ -7728,13 +8143,13 @@ expand_omp_for_static_nochunk (struct om if (fd->collapse > 1) { - int first_zero_iter = -1; - basic_block l2_dom_bb = NULL; + int first_zero_iter = -1, dummy = -1; + basic_block l2_dom_bb = NULL, dummy_bb = NULL; counts = XALLOCAVEC (tree, fd->collapse); expand_omp_for_init_counts (fd, &gsi, entry_bb, counts, fin_bb, first_zero_iter, - l2_dom_bb); + dummy_bb, dummy, l2_dom_bb); t = NULL_TREE; } else if (gimple_omp_for_combined_into_p (fd->for_stmt)) @@ -8199,13 +8614,13 @@ expand_omp_for_static_chunk (struct omp_ if (fd->collapse > 1) { - int first_zero_iter = -1; - basic_block l2_dom_bb = NULL; + int first_zero_iter = -1, dummy = -1; + basic_block l2_dom_bb = NULL, dummy_bb = NULL; counts = XALLOCAVEC (tree, fd->collapse); expand_omp_for_init_counts (fd, &gsi, entry_bb, counts, fin_bb, first_zero_iter, - l2_dom_bb); + dummy_bb, dummy, l2_dom_bb); t = NULL_TREE; } else if (gimple_omp_for_combined_into_p (fd->for_stmt)) @@ -8978,13 +9393,13 @@ expand_omp_simd (struct omp_region *regi gcc_assert (!gimple_in_ssa_p (cfun)); if (fd->collapse > 1) { - int first_zero_iter = -1; - basic_block zero_iter_bb = l2_bb; + int first_zero_iter = -1, dummy = -1; + basic_block zero_iter_bb = l2_bb, dummy_bb = NULL; counts = XALLOCAVEC (tree, fd->collapse); expand_omp_for_init_counts (fd, &gsi, entry_bb, counts, zero_iter_bb, first_zero_iter, - l2_dom_bb); + dummy_bb, dummy, l2_dom_bb); } if (l2_dom_bb == NULL) l2_dom_bb = l1_bb; @@ -9244,13 +9659,13 @@ expand_omp_taskloop_for_outer (struct om gcc_assert (gimple_code (for_stmt) == GIMPLE_OMP_FOR); if (fd->collapse > 1) { - int first_zero_iter = -1; - basic_block zero_iter_bb = NULL, l2_dom_bb = NULL; + int first_zero_iter = -1, dummy = -1; + basic_block zero_iter_bb = NULL, dummy_bb = NULL, l2_dom_bb = NULL; counts = XALLOCAVEC (tree, fd->collapse); expand_omp_for_init_counts (fd, &gsi, entry_bb, counts, zero_iter_bb, first_zero_iter, - l2_dom_bb); + dummy_bb, dummy, l2_dom_bb); if (zero_iter_bb) { @@ -9422,13 +9837,13 @@ expand_omp_taskloop_for_inner (struct om if (fd->collapse > 1) { - int first_zero_iter = -1; - basic_block l2_dom_bb = NULL; + int first_zero_iter = -1, dummy = -1; + basic_block l2_dom_bb = NULL, dummy_bb = NULL; counts = XALLOCAVEC (tree, fd->collapse); expand_omp_for_init_counts (fd, &gsi, entry_bb, counts, fin_bb, first_zero_iter, - l2_dom_bb); + dummy_bb, dummy, l2_dom_bb); t = NULL_TREE; } else @@ -9643,8 +10058,12 @@ expand_omp_for (struct omp_region *regio gcc_assert (fd.sched_kind != OMP_CLAUSE_SCHEDULE_AUTO); fn_index = (fd.sched_kind == OMP_CLAUSE_SCHEDULE_RUNTIME) ? 3 : fd.sched_kind; - fn_index += fd.have_ordered * 4; - start_ix = ((int)BUILT_IN_GOMP_LOOP_STATIC_START) + fn_index; + if (!fd.ordered) + fn_index += fd.have_ordered * 4; + if (fd.ordered) + start_ix = ((int)BUILT_IN_GOMP_LOOP_DOACROSS_STATIC_START) + fn_index; + else + start_ix = ((int)BUILT_IN_GOMP_LOOP_STATIC_START) + fn_index; next_ix = ((int)BUILT_IN_GOMP_LOOP_STATIC_NEXT) + fn_index; if (fd.iter_type == long_long_unsigned_type_node) { @@ -11081,9 +11500,24 @@ expand_omp (struct omp_region *region) expand_omp_single (region); break; + case GIMPLE_OMP_ORDERED: + { + gomp_ordered *ord_stmt + = as_a <gomp_ordered *> (last_stmt (region->entry)); + if (find_omp_clause (gimple_omp_ordered_clauses (ord_stmt), + OMP_CLAUSE_DEPEND)) + { + /* We'll expand these when expanding corresponding + worksharing region with ordered(n) clause. */ + gcc_assert (region->outer + && region->outer->type == GIMPLE_OMP_FOR); + region->ord_stmt = ord_stmt; + break; + } + } + /* FALLTHRU */ case GIMPLE_OMP_MASTER: case GIMPLE_OMP_TASKGROUP: - case GIMPLE_OMP_ORDERED: case GIMPLE_OMP_CRITICAL: case GIMPLE_OMP_TEAMS: expand_omp_synch (region); @@ -12176,7 +12610,7 @@ lower_omp_ordered_clauses (gimple_stmt_i such dependencies are known to be executed by the same thread. We take into account the direction of the loop, so a minimum - becomes a maximum if the loop is iterating backwards. We also + becomes a maximum if the loop is iterating forwards. We also ignore sink clauses where the loop direction is unknown, or where the offsets are clearly invalid because they are not a multiple of the loop increment. @@ -12188,73 +12622,65 @@ lower_omp_ordered_clauses (gimple_stmt_i for (j=0; j < M; ++j) { #pragma omp ordered \ - depend(sink:i-8,j-1) \ - depend(sink:i,j-2) \ // Completely ignored because i+0. - depend(sink:i-4,j+3) \ - depend(sink:i-6,j+2) + depend(sink:i-8,j-2) \ + depend(sink:i,j-1) \ // Completely ignored because i+0. + depend(sink:i-4,j-3) \ + depend(sink:i-6,j-4) #pragma omp ordered depend(source) } Folded clause is: - depend(sink:-gcd(8,4,6),min(-1,3,2)) + depend(sink:-gcd(8,4,6),-min(2,3,4)) -or- - depend(sink:-2,-1) + depend(sink:-2,-2) */ /* FIXME: Computing GCD's where the first element is zero is non-trivial in the presence of collapsed loops. Do this later. */ - gcc_assert (fd.collapse <= 1); + if (fd.collapse > 1) + return; - vec<wide_int> folded_deps; - folded_deps.create (len); - folded_deps.quick_grow_cleared (len); - /* Bitmap representing dimensions in the final dependency vector that - have been set. */ - sbitmap folded_deps_used = sbitmap_alloc (len); - bitmap_clear (folded_deps_used); + wide_int *folded_deps = XALLOCAVEC (wide_int, 2 * len - 1); + memset (folded_deps, 0, sizeof (*folded_deps) * (2 * len - 1)); + tree folded_dep = NULL_TREE; /* TRUE if the first dimension's offset is negative. */ bool neg_offset_p = false; - /* ?? We need to save the original iteration variables stored in the - depend clauses, because those in fd.loops[].v have already been - gimplified. Perhaps we should use the gimplified versions. ?? */ - tree *iter_vars = (tree *) alloca (sizeof (tree) * len); - memset (iter_vars, 0, sizeof (tree) * len); - list_p = gimple_omp_ordered_clauses_ptr (ord_stmt); unsigned int i; while ((c = *list_p) != NULL) { bool remove = false; - if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND - || OMP_CLAUSE_DEPEND_KIND (c) != OMP_CLAUSE_DEPEND_SINK) + gcc_assert (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND); + if (OMP_CLAUSE_DEPEND_KIND (c) != OMP_CLAUSE_DEPEND_SINK) goto next_ordered_clause; - tree decls; - for (decls = OMP_CLAUSE_DECL (c), i = 0; - decls && TREE_CODE (decls) == TREE_LIST; - decls = TREE_CHAIN (decls), ++i) + tree vec; + for (vec = OMP_CLAUSE_DECL (c), i = 0; + vec && TREE_CODE (vec) == TREE_LIST; + vec = TREE_CHAIN (vec), ++i) { gcc_assert (i < len); /* extract_omp_for_data has canonicalized the condition. */ gcc_assert (fd.loops[i].cond_code == LT_EXPR - || fd.loops[i].cond_code == LE_EXPR - || fd.loops[i].cond_code == GT_EXPR - || fd.loops[i].cond_code == GE_EXPR); - bool forward = fd.loops[i].cond_code == LT_EXPR - || fd.loops[i].cond_code == LE_EXPR; + || fd.loops[i].cond_code == GT_EXPR); + bool forward = fd.loops[i].cond_code == LT_EXPR; + bool maybe_lexically_later = true; /* While the committee makes up its mind, bail if we have any non-constant steps. */ if (TREE_CODE (fd.loops[i].step) != INTEGER_CST) goto lower_omp_ordered_ret; - wide_int offset = TREE_PURPOSE (decls); - if (!iter_vars[i]) - iter_vars[i] = TREE_VALUE (decls); + tree itype = TREE_TYPE (TREE_VALUE (vec)); + if (POINTER_TYPE_P (itype)) + itype = sizetype; + wide_int offset = wide_int::from (TREE_PURPOSE (vec), + TYPE_PRECISION (itype), + TYPE_SIGN (itype)); /* Ignore invalid offsets that are not multiples of the step. */ if (!wi::multiple_of_p @@ -12282,40 +12708,49 @@ lower_omp_ordered_clauses (gimple_stmt_i } else { - neg_offset_p = - wi::neg_p (offset, - TYPE_SIGN (TREE_TYPE (TREE_PURPOSE (decls)))); - if ((forward && !neg_offset_p) - || (!forward && neg_offset_p)) + if (!TYPE_UNSIGNED (itype) && (forward ^ wi::neg_p (offset))) { error_at (OMP_CLAUSE_LOCATION (c), "first offset must be in opposite direction " "of loop iterations"); goto lower_omp_ordered_ret; } + if (forward) + offset = -offset; + neg_offset_p = forward; /* Initialize the first time around. */ - if (!bitmap_bit_p (folded_deps_used, 0)) + if (folded_dep == NULL_TREE) { - bitmap_set_bit (folded_deps_used, 0); - folded_deps[0] = wi::abs (offset); + folded_dep = c; + folded_deps[0] = offset; } else - folded_deps[i] = wi::gcd (folded_deps[0], offset, UNSIGNED); + folded_deps[0] = wi::gcd (folded_deps[0], + offset, UNSIGNED); } } /* Calculate minimum for the remaining dimensions. */ else { - if (!bitmap_bit_p (folded_deps_used, i)) + folded_deps[len + i - 1] = offset; + if (folded_dep == c) + folded_deps[i] = offset; + else if (maybe_lexically_later + && !wi::eq_p (folded_deps[i], offset)) { - bitmap_set_bit (folded_deps_used, i); - folded_deps[i] = offset; + if (forward ^ wi::gts_p (folded_deps[i], offset)) + { + unsigned int j; + folded_dep = c; + for (j = 1; j <= i; j++) + folded_deps[j] = folded_deps[len + j - 1]; + } + else + maybe_lexically_later = false; } - else if ((forward && wi::lts_p (offset, folded_deps[i])) - || (!forward && wi::gts_p (offset, folded_deps[i]))) - folded_deps[i] = offset; } } + gcc_assert (i == len); remove = true; @@ -12326,35 +12761,22 @@ lower_omp_ordered_clauses (gimple_stmt_i list_p = &OMP_CLAUSE_CHAIN (c); } - for (i = 0; i < len; ++i) - if (!bitmap_bit_p (folded_deps_used, i)) - break; - if (i == len) + if (folded_dep) { if (neg_offset_p) folded_deps[0] = -folded_deps[0]; - tree vec = NULL; - i = len; - do - { - i--; - vec = tree_cons (wide_int_to_tree (TREE_TYPE (fd.loops[i].v), - folded_deps[i]), - iter_vars[i], vec); - } - while (i); - - c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_DEPEND); - OMP_CLAUSE_DEPEND_KIND (c) = OMP_CLAUSE_DEPEND_SINK; - OMP_CLAUSE_DECL (c) = vec; - OMP_CLAUSE_CHAIN (c) = gimple_omp_ordered_clauses (ord_stmt); - *gimple_omp_ordered_clauses_ptr (ord_stmt) = c; + tree itype = TREE_TYPE (TREE_VALUE (OMP_CLAUSE_DECL (folded_dep))); + if (POINTER_TYPE_P (itype)) + itype = sizetype; + + TREE_PURPOSE (OMP_CLAUSE_DECL (folded_dep)) + = wide_int_to_tree (itype, folded_deps[0]); + OMP_CLAUSE_CHAIN (folded_dep) = gimple_omp_ordered_clauses (ord_stmt); + *gimple_omp_ordered_clauses_ptr (ord_stmt) = folded_dep; } lower_omp_ordered_ret: - sbitmap_free (folded_deps_used); - folded_deps.release (); /* Ordered without clauses is #pragma omp threads, while we want a nop instead if we remove all clauses. */ --- gcc/omp-builtins.def.jj 2015-09-02 12:51:00.000000000 +0200 +++ gcc/omp-builtins.def 2015-09-17 09:23:39.904444459 +0200 @@ -129,6 +129,22 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ORD "GOMP_loop_ordered_runtime_start", BT_FN_BOOL_LONG_LONG_LONG_LONGPTR_LONGPTR, ATTR_NOTHROW_LEAF_LIST) +DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_DOACROSS_STATIC_START, + "GOMP_loop_doacross_static_start", + BT_FN_BOOL_UINT_LONGPTR_LONG_LONGPTR_LONGPTR, + ATTR_NOTHROW_LEAF_LIST) +DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_DOACROSS_DYNAMIC_START, + "GOMP_loop_doacross_dynamic_start", + BT_FN_BOOL_UINT_LONGPTR_LONG_LONGPTR_LONGPTR, + ATTR_NOTHROW_LEAF_LIST) +DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_DOACROSS_GUIDED_START, + "GOMP_loop_doacross_guided_start", + BT_FN_BOOL_UINT_LONGPTR_LONG_LONGPTR_LONGPTR, + ATTR_NOTHROW_LEAF_LIST) +DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_DOACROSS_RUNTIME_START, + "GOMP_loop_doacross_runtime_start", + BT_FN_BOOL_UINT_LONGPTR_LONGPTR_LONGPTR, + ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_STATIC_NEXT, "GOMP_loop_static_next", BT_FN_BOOL_LONGPTR_LONGPTR, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_DYNAMIC_NEXT, "GOMP_loop_dynamic_next", @@ -230,6 +246,10 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ORDERED_ BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ORDERED_END, "GOMP_ordered_end", BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST) +DEF_GOMP_BUILTIN (BUILT_IN_GOMP_DOACROSS_POST, "GOMP_doacross_post", + BT_FN_VOID_LONG_VAR, ATTR_NOTHROW_LEAF_LIST) +DEF_GOMP_BUILTIN (BUILT_IN_GOMP_DOACROSS_WAIT, "GOMP_doacross_wait", + BT_FN_VOID_LONG_VAR, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_PARALLEL, "GOMP_parallel", BT_FN_VOID_OMPFN_PTR_UINT_UINT, ATTR_NOTHROW_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TASK, "GOMP_task", --- gcc/builtin-types.def.jj 2015-09-02 12:51:51.000000000 +0200 +++ gcc/builtin-types.def 2015-09-17 09:24:53.776384307 +0200 @@ -473,6 +473,8 @@ DEF_FUNCTION_TYPE_4 (BT_FN_VOID_SIZE_VPT BT_VOLATILE_PTR, BT_PTR, BT_INT) DEF_FUNCTION_TYPE_4 (BT_FN_VOID_SIZE_CONST_VPTR_PTR_INT, BT_VOID, BT_SIZE, BT_CONST_VOLATILE_PTR, BT_PTR, BT_INT) +DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_LONGPTR_LONGPTR_LONGPTR, + BT_BOOL, BT_UINT, BT_PTR_LONG, BT_PTR_LONG, BT_PTR_LONG) DEF_FUNCTION_TYPE_5 (BT_FN_INT_STRING_INT_SIZE_CONST_STRING_VALIST_ARG, BT_INT, BT_STRING, BT_INT, BT_SIZE, BT_CONST_STRING, @@ -497,6 +499,9 @@ DEF_FUNCTION_TYPE_5 (BT_FN_VOID_INT_SIZE DEF_FUNCTION_TYPE_5 (BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT, BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT, BT_UINT) +DEF_FUNCTION_TYPE_5 (BT_FN_BOOL_UINT_LONGPTR_LONG_LONGPTR_LONGPTR, + BT_BOOL, BT_UINT, BT_PTR_LONG, BT_LONG, BT_PTR_LONG, + BT_PTR_LONG) DEF_FUNCTION_TYPE_6 (BT_FN_INT_STRING_SIZE_INT_SIZE_CONST_STRING_VALIST_ARG, BT_INT, BT_STRING, BT_SIZE, BT_INT, BT_SIZE, @@ -571,6 +576,8 @@ DEF_FUNCTION_TYPE_VAR_1 (BT_FN_INT_CONST BT_INT, BT_CONST_STRING) DEF_FUNCTION_TYPE_VAR_1 (BT_FN_UINT32_UINT32_VAR, BT_UINT32, BT_UINT32) +DEF_FUNCTION_TYPE_VAR_1 (BT_FN_VOID_LONG_VAR, + BT_VOID, BT_LONG) DEF_FUNCTION_TYPE_VAR_2 (BT_FN_INT_FILEPTR_CONST_STRING_VAR, BT_INT, BT_FILEPTR, BT_CONST_STRING) --- gcc/fortran/types.def.jj 2015-09-02 12:52:20.000000000 +0200 +++ gcc/fortran/types.def 2015-09-17 09:31:11.020977009 +0200 @@ -154,6 +154,8 @@ DEF_FUNCTION_TYPE_4 (BT_FN_VOID_SIZE_VPT BT_VOLATILE_PTR, BT_PTR, BT_INT) DEF_FUNCTION_TYPE_4 (BT_FN_VOID_SIZE_CONST_VPTR_PTR_INT, BT_VOID, BT_SIZE, BT_CONST_VOLATILE_PTR, BT_PTR, BT_INT) +DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_LONGPTR_LONGPTR_LONGPTR, + BT_BOOL, BT_UINT, BT_PTR_LONG, BT_PTR_LONG, BT_PTR_LONG) DEF_FUNCTION_TYPE_5 (BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT, BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT, @@ -165,6 +167,9 @@ DEF_FUNCTION_TYPE_5 (BT_FN_VOID_SIZE_VPT BT_VOLATILE_PTR, BT_PTR, BT_PTR, BT_INT) DEF_FUNCTION_TYPE_5 (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, BT_VOID, BT_INT, BT_SIZE, BT_PTR, BT_PTR, BT_PTR) +DEF_FUNCTION_TYPE_5 (BT_FN_BOOL_UINT_LONGPTR_LONG_LONGPTR_LONGPTR, + BT_BOOL, BT_UINT, BT_PTR_LONG, BT_LONG, BT_PTR_LONG, + BT_PTR_LONG) DEF_FUNCTION_TYPE_6 (BT_FN_BOOL_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR, BT_BOOL, BT_LONG, BT_LONG, BT_LONG, BT_LONG, @@ -225,6 +230,9 @@ DEF_FUNCTION_TYPE_11 (BT_FN_VOID_OMPFN_P DEF_FUNCTION_TYPE_VAR_0 (BT_FN_VOID_VAR, BT_VOID) +DEF_FUNCTION_TYPE_VAR_1 (BT_FN_VOID_LONG_VAR, + BT_VOID, BT_LONG) + DEF_FUNCTION_TYPE_VAR_2 (BT_FN_VOID_INT_INT_VAR, BT_VOID, BT_INT, BT_INT) DEF_FUNCTION_TYPE_VAR_7 (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR, --- gcc/fortran/f95-lang.c.jj 2015-09-03 16:39:12.000000000 +0200 +++ gcc/fortran/f95-lang.c 2015-09-15 15:02:51.502042179 +0200 @@ -640,6 +640,7 @@ gfc_init_builtin_functions (void) #define DEF_FUNCTION_TYPE_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \ ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME, #define DEF_FUNCTION_TYPE_VAR_0(NAME, RETURN) NAME, +#define DEF_FUNCTION_TYPE_VAR_1(NAME, RETURN, ARG1) NAME, #define DEF_FUNCTION_TYPE_VAR_2(NAME, RETURN, ARG1, ARG2) NAME, #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \ ARG6, ARG7) NAME, @@ -661,6 +662,7 @@ gfc_init_builtin_functions (void) #undef DEF_FUNCTION_TYPE_10 #undef DEF_FUNCTION_TYPE_11 #undef DEF_FUNCTION_TYPE_VAR_0 +#undef DEF_FUNCTION_TYPE_VAR_1 #undef DEF_FUNCTION_TYPE_VAR_2 #undef DEF_FUNCTION_TYPE_VAR_7 #undef DEF_FUNCTION_TYPE_VAR_11 @@ -1144,6 +1146,11 @@ gfc_init_builtin_functions (void) builtin_types[(int) ENUM] \ = build_varargs_function_type_list (builtin_types[(int) RETURN], \ NULL_TREE); +#define DEF_FUNCTION_TYPE_VAR_1(ENUM, RETURN, ARG1) \ + builtin_types[(int) ENUM] \ + = build_varargs_function_type_list (builtin_types[(int) RETURN], \ + builtin_types[(int) ARG1], \ + NULL_TREE); #define DEF_FUNCTION_TYPE_VAR_2(ENUM, RETURN, ARG1, ARG2) \ builtin_types[(int) ENUM] \ = build_varargs_function_type_list (builtin_types[(int) RETURN], \ @@ -1194,6 +1201,7 @@ gfc_init_builtin_functions (void) #undef DEF_FUNCTION_TYPE_8 #undef DEF_FUNCTION_TYPE_10 #undef DEF_FUNCTION_TYPE_VAR_0 +#undef DEF_FUNCTION_TYPE_VAR_1 #undef DEF_FUNCTION_TYPE_VAR_2 #undef DEF_FUNCTION_TYPE_VAR_7 #undef DEF_FUNCTION_TYPE_VAR_11 --- gcc/testsuite/c-c++-common/gomp/sink-4.c.jj 2015-08-24 14:32:06.000000000 +0200 +++ gcc/testsuite/c-c++-common/gomp/sink-4.c 2015-09-18 18:14:02.786996784 +0200 @@ -22,4 +22,4 @@ funk (foo *begin, foo *end) } } -/* { dg-final { scan-tree-dump-times "depend\\(sink:p\\+400.\\)" 1 "omplower" } } */ +/* { dg-final { scan-tree-dump-times "depend\\(sink:p\\+400\\)" 1 "omplower" } } */ --- gcc/testsuite/gcc.dg/gomp/sink-fold-1.c.jj 2015-08-24 14:32:06.000000000 +0200 +++ gcc/testsuite/gcc.dg/gomp/sink-fold-1.c 2015-09-18 18:34:26.234773145 +0200 @@ -3,28 +3,29 @@ /* Test depend(sink) clause folding. */ -int i,j, N; +int i,j,k, N; extern void bar(); void funk () { -#pragma omp parallel for ordered(2) +#pragma omp parallel for ordered(3) for (i=0; i < N; i++) for (j=0; j < N; ++j) + for (k=0; k < N; ++k) { -/* We remove the (sink:i,j-2) by virtue of it the i+0. The remaining - clauses get folded with a GCD of -2 for `i' and a minimum of -1 for - 'j'. */ +/* We remove the (sink:i,j-1,k) by virtue of it the i+0. The remaining + clauses get folded with a GCD of -2 for `i' and a maximum of -2, +2 for + 'j' and 'k'. */ #pragma omp ordered \ - depend(sink:i-8,j-1) \ - depend(sink:i, j-2) \ - depend(sink:i-4,j+3) \ - depend(sink:i-6,j+2) + depend(sink:i-8,j-2,k+2) \ + depend(sink:i, j-1,k) \ + depend(sink:i-4,j-3,k+6) \ + depend(sink:i-6,j-4,k-6) bar(); #pragma omp ordered depend(source) } } -/* { dg-final { scan-tree-dump-times "omp ordered depend\\(sink:i-2,j-1\\)" 1 "omplower" } } */ +/* { dg-final { scan-tree-dump-times "omp ordered depend\\(sink:i-2,j-2,k\\+2\\)" 1 "omplower" } } */ --- gcc/testsuite/gcc.dg/gomp/sink-fold-3.c.jj 2015-08-24 14:32:06.000000000 +0200 +++ gcc/testsuite/gcc.dg/gomp/sink-fold-3.c 2015-09-18 18:21:10.576989633 +0200 @@ -22,4 +22,4 @@ funk (foo *begin, foo *end) } } -/* { dg-final { scan-tree-dump-times "depend\\(sink:p\\+800B\\)" 1 "omplower" } } */ +/* { dg-final { scan-tree-dump-times "depend\\(sink:p\\+800\\)" 1 "omplower" } } */ --- libgomp/libgomp.map.jj 2015-09-03 16:42:25.000000000 +0200 +++ libgomp/libgomp.map 2015-09-18 18:12:29.569305773 +0200 @@ -274,6 +274,12 @@ GOMP_4.1 { GOMP_taskloop_ull; GOMP_offload_register_ver; GOMP_offload_unregister_ver; + GOMP_loop_doacross_dynamic_start; + GOMP_loop_doacross_guided_start; + GOMP_loop_doacross_runtime_start; + GOMP_loop_doacross_static_start; + GOMP_doacross_post; + GOMP_doacross_wait; } GOMP_4.0.1; OACC_2.0 { --- libgomp/ordered.c.jj 2015-04-24 12:30:40.000000000 +0200 +++ libgomp/ordered.c 2015-09-18 18:36:42.053857644 +0200 @@ -26,6 +26,7 @@ /* This file handles the ORDERED construct. */ #include "libgomp.h" +#include <stdarg.h> /* This function is called when first allocating an iteration block. That @@ -250,3 +251,23 @@ void GOMP_ordered_end (void) { } + +/* DOACROSS POST operation. */ + +void +GOMP_doacross_post (long first, ...) +{ + va_list ap; + va_start (ap, first); + va_end (ap); +} + +/* DOACROSS WAIT operation. */ + +void +GOMP_doacross_wait (long first, ...) +{ + va_list ap; + va_start (ap, first); + va_end (ap); +} --- libgomp/loop.c.jj 2015-06-11 10:27:29.000000000 +0200 +++ libgomp/loop.c 2015-09-16 14:21:10.465819707 +0200 @@ -289,6 +289,109 @@ GOMP_loop_ordered_runtime_start (long st } } +/* The *_doacross_*_start routines are similar. The only difference is that + this work-share construct is initialized to expect an ORDERED(N) - DOACROSS + section, and the worksharing loop iterates always from 0 to COUNTS[0] - 1 + and other COUNTS array elements tell the library number of iterations + in the ordered inner loops. */ + +static bool +gomp_loop_doacross_static_start (unsigned ncounts, long *counts, + long chunk_size, long *istart, long *iend) +{ + struct gomp_thread *thr = gomp_thread (); + + thr->ts.static_trip = 0; + if (gomp_work_share_start (false)) + { + gomp_loop_init (thr->ts.work_share, 0, counts[0], 1, + GFS_STATIC, chunk_size); + /* gomp_ordered_static_init (); */ + gomp_work_share_init_done (); + } + + return !gomp_iter_static_next (istart, iend); +} + +static bool +gomp_loop_doacross_dynamic_start (unsigned ncounts, long *counts, + long chunk_size, long *istart, long *iend) +{ + struct gomp_thread *thr = gomp_thread (); + bool ret; + + if (gomp_work_share_start (false)) + { + gomp_loop_init (thr->ts.work_share, 0, counts[0], 1, + GFS_DYNAMIC, chunk_size); + gomp_work_share_init_done (); + } + +#ifdef HAVE_SYNC_BUILTINS + ret = gomp_iter_dynamic_next (istart, iend); +#else + gomp_mutex_lock (&thr->ts.work_share->lock); + ret = gomp_iter_dynamic_next_locked (istart, iend); + gomp_mutex_unlock (&thr->ts.work_share->lock); +#endif + + return ret; +} + +static bool +gomp_loop_doacross_guided_start (unsigned ncounts, long *counts, + long chunk_size, long *istart, long *iend) +{ + struct gomp_thread *thr = gomp_thread (); + bool ret; + + if (gomp_work_share_start (false)) + { + gomp_loop_init (thr->ts.work_share, 0, counts[0], 1, + GFS_GUIDED, chunk_size); + gomp_work_share_init_done (); + } + +#ifdef HAVE_SYNC_BUILTINS + ret = gomp_iter_guided_next (istart, iend); +#else + gomp_mutex_lock (&thr->ts.work_share->lock); + ret = gomp_iter_guided_next_locked (istart, iend); + gomp_mutex_unlock (&thr->ts.work_share->lock); +#endif + + return ret; +} + +bool +GOMP_loop_doacross_runtime_start (unsigned ncounts, long *counts, + long *istart, long *iend) +{ + struct gomp_task_icv *icv = gomp_icv (false); + switch (icv->run_sched_var) + { + case GFS_STATIC: + return gomp_loop_doacross_static_start (ncounts, counts, + icv->run_sched_chunk_size, + istart, iend); + case GFS_DYNAMIC: + return gomp_loop_doacross_dynamic_start (ncounts, counts, + icv->run_sched_chunk_size, + istart, iend); + case GFS_GUIDED: + return gomp_loop_doacross_guided_start (ncounts, counts, + icv->run_sched_chunk_size, + istart, iend); + case GFS_AUTO: + /* For now map to schedule(static), later on we could play with feedback + driven choice. */ + return gomp_loop_doacross_static_start (ncounts, counts, + 0, istart, iend); + default: + abort (); + } +} + /* The *_next routines are called when the thread completes processing of the iteration block currently assigned to it. If the work-share construct is bound directly to a parallel construct, then the iteration @@ -581,6 +684,13 @@ extern __typeof(gomp_loop_ordered_dynami extern __typeof(gomp_loop_ordered_guided_start) GOMP_loop_ordered_guided_start __attribute__((alias ("gomp_loop_ordered_guided_start"))); +extern __typeof(gomp_loop_doacross_static_start) GOMP_loop_doacross_static_start + __attribute__((alias ("gomp_loop_doacross_static_start"))); +extern __typeof(gomp_loop_doacross_dynamic_start) GOMP_loop_doacross_dynamic_start + __attribute__((alias ("gomp_loop_doacross_dynamic_start"))); +extern __typeof(gomp_loop_doacross_guided_start) GOMP_loop_doacross_guided_start + __attribute__((alias ("gomp_loop_doacross_guided_start"))); + extern __typeof(gomp_loop_static_next) GOMP_loop_static_next __attribute__((alias ("gomp_loop_static_next"))); extern __typeof(gomp_loop_dynamic_next) GOMP_loop_dynamic_next @@ -641,6 +751,30 @@ GOMP_loop_ordered_guided_start (long sta } bool +GOMP_loop_doacross_static_start (unsigned ncounts, long *counts, + long chunk_size, long *istart, long *iend) +{ + return gomp_loop_doacross_static_start (ncounts, counts, chunk_size, + istart, iend); +} + +bool +GOMP_loop_doacross_dynamic_start (unsigned ncounts, long *counts, + long chunk_size, long *istart, long *iend) +{ + return gomp_loop_doacross_dynamic_start (ncounts, counts, chunk_size, + istart, iend); +} + +bool +GOMP_loop_doacross_guided_start (unsigned ncounts, long *counts, + long chunk_size, long *istart, long *iend) +{ + return gomp_loop_doacross_guided_start (ncounts, counts, chunk_size, + istart, iend); +} + +bool GOMP_loop_static_next (long *istart, long *iend) { return gomp_loop_static_next (istart, iend); --- libgomp/libgomp_g.h.jj 2015-09-02 12:50:21.000000000 +0200 +++ libgomp/libgomp_g.h 2015-09-17 09:25:23.324960250 +0200 @@ -71,6 +71,15 @@ extern bool GOMP_loop_ordered_dynamic_ne extern bool GOMP_loop_ordered_guided_next (long *, long *); extern bool GOMP_loop_ordered_runtime_next (long *, long *); +extern bool GOMP_loop_doacross_static_start (unsigned, long *, long, long *, + long *); +extern bool GOMP_loop_doacross_dynamic_start (unsigned, long *, long, long *, + long *); +extern bool GOMP_loop_doacross_guided_start (unsigned, long *, long, long *, + long *); +extern bool GOMP_loop_doacross_runtime_start (unsigned, long *, long *, + long *); + extern void GOMP_parallel_loop_static_start (void (*)(void *), void *, unsigned, long, long, long, long); extern void GOMP_parallel_loop_dynamic_start (void (*)(void *), void *, @@ -168,6 +177,8 @@ extern bool GOMP_loop_ull_ordered_runtim extern void GOMP_ordered_start (void); extern void GOMP_ordered_end (void); +extern void GOMP_doacross_post (long, ...); +extern void GOMP_doacross_wait (long, ...); /* parallel.c */ Jakub