Hi, this patch removes omp-expansion generated control flow from simple omp parallel loops (identified by the previous patch) and makes the functions generated from these loops with attribute "hsakernel" which means that it also adds two parameters denoting the iteration size and group size (which I take from chunk size, if there is any).
Before explaining this in more detail, let me clarify two things. First, for various reasons, we do not attempt to make any provisions for OMP 4.0 yet, so do not look for it in the code. Second, this is just a proof-of concept patch, I understand that it is quite horrible in a number of ways which is a result of my lack of experience in this area and fairly unpleasant time pressure that was caused mainly by things well beyond our control. I will wholeheartedly appreciate any guidance in reimplementing this properly. Basically, for expansion into HSA, we want omp parallel loop to only contain the body of the loop, it should not contain the computation of the portion of workload the particular thread or the loop over that portion. Each iteration is one thread for us and the size of the iteration space must be set by code invoking the kernel. For example, in #pragma omp parallel for shared(a,b) firstprivate(n) private(i) for (i = 0; i < n; i++) a[i] = b[i] * b[i]; We want the generated omp child function to only contain: i = omp_get_thread_num(); a[i] = b[i] * b[i]; and nothing more (modulo loading from omp_data structure and other unrelated stuff). This is implemented by ignoring large parts of expand_omp_for_static_nochunk when such simple loop is being expanded. A far bigger obstacle was that the code invoking the kernel (which represents the whole parallel construct) and thus code outside of the parallel construct must calculate the iteration size in order to verify that the loop should be run at all and so that it can provide it as a kernel parameter. Here I ran into severe problems caused by variable remapping, because when I attempted to just move the conditions above the parallel statement, the variables were already remapped, resulting in undefined loads, and I did not find any way of mapping them back. Eventually I resorted to hiding away the loop parameters one more time in the gimple statement itself but that is of course a terrible hack. Another problem which I have not attempted to solve in this patch is how to generate both code for the host and the accelerator. Basically we would want OMP expansion to generate two very different child functions for OMP parallel loops which we want to turn into kernels but the bottom-up structure of OMP expansion makes this very difficult. I have not been able to find a public branch for offloading to Nvidia PTX but I assume it faces the same problem. Have you guys attempted to tackle them somehow? Or am I just completely misguided in my thoughts? Anyway, as with the previous patches, I have bootstrapped this just to catch errors and tested it on a number of OMP testcases and it did not introduce any new failures. Committed to the hsa branch. Thanks, Martin 2014-09-26 Martin Jambor <mjam...@suse.cz> * gimple.c (gimple_build_omp_for): Allocate prev_first_iter. * gimple.h (gimple_statement_omp_for): New field orig_first_iter. (gimple_omp_for_set_orig_first_iter): New function. * gimplify.c (gimplify_omp_for): Use it. * omp-low.c (omp_region): New fields req_group_size and orig_first_iter. (adjust_for_condition): New function. (get_omp_for_step_from_incr): Likewise. (extract_omp_for_data): Moved some functionality to the above two new functions. (create_omp_child_function): Do not append hsa attributes to child_fn. (expand_parallel_call): Handle kernelized parallel regions. (expand_omp_for_static_nochunk): Kernelize marked loops. (expand_omp_for): Copy prev_first_iter of to-be-kernelized omp for statements to the region structure, mark requested group size. Always expand these loops with expand_omp_for_static_nochunk. * tree-sra.c (ipa_sra_preliminary_function_checks): Test TREE_USED. --- gcc/gimple.c | 1 + gcc/gimple.h | 20 +++ gcc/gimplify.c | 13 ++ gcc/omp-low.c | 462 +++++++++++++++++++++++++++++++++++++-------------------- gcc/tree-sra.c | 2 +- 5 files changed, 338 insertions(+), 160 deletions(-) diff --git a/gcc/gimple.c b/gcc/gimple.c index db76174..4c6f4c2 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -836,6 +836,7 @@ gimple_build_omp_for (gimple_seq body, int kind, tree clauses, size_t collapse, gimple_omp_for_set_kind (p, kind); p->collapse = collapse; p->iter = ggc_cleared_vec_alloc<gimple_omp_for_iter> (collapse); + p->orig_first_iter = ggc_cleared_alloc<gimple_omp_for_iter> (); if (pre_body) gimple_omp_for_set_pre_body (p, pre_body); diff --git a/gcc/gimple.h b/gcc/gimple.h index ec41585..79265fd 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -550,6 +550,11 @@ struct GTY((tag("GSS_OMP_FOR"))) struct gimple_omp_for_iter * GTY((length ("%h.collapse"))) iter; /* [ WORD 11 ] + Copy of the first iteration information for the purposes of HSA + kernelization. */ + struct gimple_omp_for_iter *orig_first_iter; + + /* [ WORD 12 ] Pre-body evaluated before the loop body begins. */ gimple_seq pre_body; }; @@ -5275,6 +5280,21 @@ gimple_omp_for_set_cond (gimple gs, size_t i, enum tree_code cond) omp_for_stmt->iter[i].cond = cond; } +/* Set the original first dimension iteration information. */ + +static inline void +gimple_omp_for_set_orig_first_iter (gimple gs, tree index, tree initial, + tree final, tree incr, enum tree_code cond) +{ + gimple_statement_omp_for *omp_for_stmt = + as_a <gimple_statement_omp_for *> (gs); + omp_for_stmt->orig_first_iter->index = index; + omp_for_stmt->orig_first_iter->initial = initial; + omp_for_stmt->orig_first_iter->final = final; + omp_for_stmt->orig_first_iter->incr = copy_node (incr); + omp_for_stmt->orig_first_iter->cond = cond; +} + /* Return the condition code associated with OMP_FOR GS. */ diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 0ebc24c..b014802 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -6770,6 +6770,14 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) == TREE_VEC_LENGTH (OMP_FOR_COND (for_stmt))); gcc_assert (TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)) == TREE_VEC_LENGTH (OMP_FOR_INCR (for_stmt))); + + tree zero_for_init = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), 0); + tree orig_zero_index = TREE_OPERAND (zero_for_init, 0); + tree orig_zero_initial = TREE_OPERAND (zero_for_init, 1); + tree zero_for_cond = TREE_VEC_ELT (OMP_FOR_COND (for_stmt), 0); + enum tree_code orig_zero_cond = TREE_CODE (zero_for_cond); + tree orig_zero_final = TREE_OPERAND (zero_for_cond, 1); + for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)); i++) { t = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), i); @@ -7093,6 +7101,11 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) gimple_omp_for_set_incr (gfor, i, TREE_OPERAND (t, 1)); } + t = TREE_VEC_ELT (OMP_FOR_INCR (for_stmt), 0); + gimple_omp_for_set_orig_first_iter (gfor, orig_zero_index, orig_zero_initial, + orig_zero_final, TREE_OPERAND (t, 1), + orig_zero_cond); + gimplify_seq_add_stmt (pre_p, gfor); if (ret != GS_ALL_DONE) return GS_ERROR; diff --git a/gcc/omp-low.c b/gcc/omp-low.c index ea8a2aa..4eca6f9 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -124,6 +124,12 @@ struct omp_region /* True if this region is or is a part of kernelized parallel block. */ bool kernelize; + + /* Requested group size or kernelized loops. */ + tree req_group_size; + + /* For kernelized for loops, the original iteration information. */ + struct gimple_omp_for_iter *orig_first_iter; }; /* Context structure. Used to store information about each parallel @@ -287,6 +293,63 @@ is_combined_parallel (struct omp_region *region) return region->is_combined_parallel; } +/* Adjust *COND_CODE and *N@ so that the former is either LT_EXPR or + GT_EXPR. */ + +static void +adjust_for_condition (location_t loc, enum tree_code *cond_code, tree *n2) +{ + switch (*cond_code) + { + case LT_EXPR: + case GT_EXPR: + case NE_EXPR: + break; + case LE_EXPR: + if (POINTER_TYPE_P (TREE_TYPE (*n2))) + *n2 = fold_build_pointer_plus_hwi_loc (loc, *n2, 1); + else + *n2 = fold_build2_loc (loc, PLUS_EXPR, TREE_TYPE (*n2), *n2, + build_int_cst (TREE_TYPE (*n2), 1)); + *cond_code = LT_EXPR; + break; + case GE_EXPR: + if (POINTER_TYPE_P (TREE_TYPE (*n2))) + *n2 = fold_build_pointer_plus_hwi_loc (loc, *n2, -1); + else + *n2 = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (*n2), *n2, + build_int_cst (TREE_TYPE (*n2), 1)); + *cond_code = GT_EXPR; + break; + default: + gcc_unreachable (); + } +} + +/* Return the looping step from INCR, extracted from the gimple omp + statement. */ + +static tree +get_omp_for_step_from_incr (location_t loc, tree incr) +{ + tree step; + switch (TREE_CODE (incr)) + { + case PLUS_EXPR: + step = TREE_OPERAND (incr, 1); + break; + case POINTER_PLUS_EXPR: + step = fold_convert (ssizetype, TREE_OPERAND (incr, 1)); + break; + case MINUS_EXPR: + step = TREE_OPERAND (incr, 1); + step = fold_build1_loc (loc, NEGATE_EXPR, TREE_TYPE (step), step); + break; + default: + gcc_unreachable (); + } + return step; +} /* Extract the header elements of parallel loop FOR_STMT and store them into *FD. */ @@ -391,58 +454,14 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd, loop->cond_code = gimple_omp_for_cond (for_stmt, i); loop->n2 = gimple_omp_for_final (for_stmt, i); - switch (loop->cond_code) - { - case LT_EXPR: - case GT_EXPR: - break; - case NE_EXPR: - gcc_assert (gimple_omp_for_kind (for_stmt) - == GF_OMP_FOR_KIND_CILKSIMD - || (gimple_omp_for_kind (for_stmt) - == GF_OMP_FOR_KIND_CILKFOR)); - break; - case LE_EXPR: - if (POINTER_TYPE_P (TREE_TYPE (loop->n2))) - loop->n2 = fold_build_pointer_plus_hwi_loc (loc, loop->n2, 1); - else - loop->n2 = fold_build2_loc (loc, - PLUS_EXPR, TREE_TYPE (loop->n2), loop->n2, - build_int_cst (TREE_TYPE (loop->n2), 1)); - loop->cond_code = LT_EXPR; - break; - case GE_EXPR: - if (POINTER_TYPE_P (TREE_TYPE (loop->n2))) - loop->n2 = fold_build_pointer_plus_hwi_loc (loc, loop->n2, -1); - else - loop->n2 = fold_build2_loc (loc, - MINUS_EXPR, TREE_TYPE (loop->n2), loop->n2, - build_int_cst (TREE_TYPE (loop->n2), 1)); - loop->cond_code = GT_EXPR; - break; - default: - gcc_unreachable (); - } + gcc_assert (loop->cond_code != NE_EXPR + || gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_CILKSIMD + || gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_CILKFOR); + adjust_for_condition (loc, &loop->cond_code, &loop->n2); t = gimple_omp_for_incr (for_stmt, i); gcc_assert (TREE_OPERAND (t, 0) == var); - switch (TREE_CODE (t)) - { - case PLUS_EXPR: - loop->step = TREE_OPERAND (t, 1); - break; - case POINTER_PLUS_EXPR: - loop->step = fold_convert (ssizetype, TREE_OPERAND (t, 1)); - break; - case MINUS_EXPR: - loop->step = TREE_OPERAND (t, 1); - loop->step = fold_build1_loc (loc, - NEGATE_EXPR, TREE_TYPE (loop->step), - loop->step); - break; - default: - gcc_unreachable (); - } + loop->step = get_omp_for_step_from_incr (loc, t); if (simd || (fd->sched_kind == OMP_CLAUSE_SCHEDULE_STATIC @@ -1946,11 +1965,6 @@ create_omp_child_function (omp_context *ctx, bool task_copy) = tree_cons (get_identifier ("omp declare target"), NULL_TREE, DECL_ATTRIBUTES (decl)); - DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("hsa"), NULL_TREE, - DECL_ATTRIBUTES (decl)); - DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("flatten"), NULL_TREE, - DECL_ATTRIBUTES (decl)); - t = build_decl (DECL_SOURCE_LOCATION (decl), RESULT_DECL, NULL_TREE, void_type_node); DECL_ARTIFICIAL (t) = 1; @@ -4453,11 +4467,98 @@ expand_parallel_call (struct omp_region *region, basic_block bb, if (1 && !ws_args && !cond && start_ix == BUILT_IN_GOMP_PARALLEL) { - vec_alloc (args, 1); + tree child_fn = gimple_omp_parallel_child_fn (entry_stmt); + vec_alloc (args, region->kernelize ? 3 : 1); args->quick_push (t1); - t = build_call_expr_loc_vec (UNKNOWN_LOCATION, - gimple_omp_parallel_child_fn (entry_stmt), - args); + + if (region->kernelize) + { + struct gimple_omp_for_iter *pfi = region->inner->orig_first_iter; + location_t loc = gimple_location (entry_stmt); + tree itype, type = TREE_TYPE (pfi->index); + if (POINTER_TYPE_P (type)) + itype = signed_type_for (type); + else + itype = type; + + enum tree_code cond_code = pfi->cond; + tree n1 = pfi->initial; + tree n2 = pfi->final; + adjust_for_condition (loc, &cond_code, &n2); + tree step = get_omp_for_step_from_incr (loc, pfi->incr); + + n1 = force_gimple_operand_gsi (&gsi, fold_convert (type, n1), + true, NULL_TREE, false, + GSI_CONTINUE_LINKING); + n2 = force_gimple_operand_gsi (&gsi, fold_convert (itype, n2), + true, NULL_TREE, false, + GSI_CONTINUE_LINKING); + + t = fold_binary (cond_code, boolean_type_node, + fold_convert (type, n1), fold_convert (type, n2)); + if (t == NULL_TREE || !integer_onep (t)) + { + gimple cond = gimple_build_cond (cond_code, n1, n2, NULL_TREE, + NULL_TREE); + + gsi_insert_after (&gsi, cond, GSI_NEW_STMT); + edge ef = split_block (bb, cond); + ef->flags = EDGE_FALSE_VALUE; + ef->probability = REG_BR_PROB_BASE / 2000 - 1; + basic_block cbb = create_empty_bb (ef->src); + edge et = make_edge (ef->src, cbb, EDGE_TRUE_VALUE); + set_immediate_dominator (CDI_DOMINATORS, cbb, ef->src); + add_bb_to_loop (cbb, bb->loop_father); + et->probability = REG_BR_PROB_BASE - (REG_BR_PROB_BASE / 2000 + - 1); + make_edge (cbb, ef->dest, EDGE_TRUE_VALUE)->flags = EDGE_FALLTHRU; + gsi = gsi_start_bb (cbb); + } + + step = force_gimple_operand_gsi (&gsi, fold_convert (itype, step), + true, NULL_TREE, false, + GSI_CONTINUE_LINKING); + tree t = build_int_cst (itype, (cond_code == LT_EXPR ? -1 : 1)); + t = fold_build2 (PLUS_EXPR, itype, step, t); + t = fold_build2 (PLUS_EXPR, itype, t, n2); + t = fold_build2 (MINUS_EXPR, itype, t, fold_convert (itype, n1)); + if (TYPE_UNSIGNED (itype) && cond_code == GT_EXPR) + t = fold_build2 (TRUNC_DIV_EXPR, itype, + fold_build1 (NEGATE_EXPR, itype, t), + fold_build1 (NEGATE_EXPR, itype, step)); + else + t = fold_build2 (TRUNC_DIV_EXPR, itype, t, step); + t = fold_convert (itype, t); + tree n = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, false, + GSI_CONTINUE_LINKING); + + args->quick_push (n); + if (region->inner->req_group_size) + t = fold_convert (uint32_type_node, region->inner->req_group_size); + else + { + t = build_int_cst (uint32_type_node, 16); + t = fold_build2_loc (loc, MIN_EXPR, uint32_type_node, + fold_convert (uint32_type_node, n), t); + t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, false, + GSI_CONTINUE_LINKING); + } + args->quick_push (t); + + DECL_ATTRIBUTES (child_fn) = tree_cons (get_identifier ("hsakernel"), + NULL_TREE, + DECL_ATTRIBUTES (child_fn)); + } + else + DECL_ATTRIBUTES (child_fn) = tree_cons (get_identifier ("hsa"), + NULL_TREE, + DECL_ATTRIBUTES (child_fn)); + + DECL_ATTRIBUTES (child_fn) = tree_cons (get_identifier ("flatten"), + NULL_TREE, + DECL_ATTRIBUTES (child_fn)); + + t = build_call_expr_loc_vec (UNKNOWN_LOCATION, child_fn, args); } else { @@ -5969,9 +6070,9 @@ expand_omp_for_static_nochunk (struct omp_region *region, struct omp_for_data *fd, gimple inner_stmt) { - tree n, q, s0, e0, e, t, tt, nthreads, threadid; + tree n, q, s0 = NULL, e0 = NULL, e = NULL, t, tt, nthreads = NULL, threadid; tree type, itype, vmain, vback; - basic_block entry_bb, second_bb, third_bb, exit_bb, seq_start_bb; + basic_block entry_bb, second_bb = NULL, third_bb = NULL, exit_bb, seq_start_bb; basic_block body_bb, cont_bb, collapse_bb = NULL; basic_block fin_bb; gimple_stmt_iterator gsi; @@ -6069,12 +6170,13 @@ expand_omp_for_static_nochunk (struct omp_region *region, } gsi = gsi_last_bb (entry_bb); } - - t = build_call_expr (builtin_decl_explicit (get_num_threads), 0); - t = fold_convert (itype, t); - nthreads = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, - true, GSI_SAME_STMT); - + if (!region->kernelize) + { + t = build_call_expr (builtin_decl_explicit (get_num_threads), 0); + t = fold_convert (itype, t); + nthreads = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, + true, GSI_SAME_STMT); + } t = build_call_expr (builtin_decl_explicit (get_thread_num), 0); t = fold_convert (itype, t); threadid = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, @@ -6101,56 +6203,65 @@ expand_omp_for_static_nochunk (struct omp_region *region, step = force_gimple_operand_gsi (&gsi, fold_convert (itype, step), true, NULL_TREE, true, GSI_SAME_STMT); - t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1)); - t = fold_build2 (PLUS_EXPR, itype, step, t); - t = fold_build2 (PLUS_EXPR, itype, t, n2); - t = fold_build2 (MINUS_EXPR, itype, t, fold_convert (itype, n1)); - if (TYPE_UNSIGNED (itype) && fd->loop.cond_code == GT_EXPR) - t = fold_build2 (TRUNC_DIV_EXPR, itype, - fold_build1 (NEGATE_EXPR, itype, t), - fold_build1 (NEGATE_EXPR, itype, step)); - else - t = fold_build2 (TRUNC_DIV_EXPR, itype, t, step); - t = fold_convert (itype, t); - n = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, GSI_SAME_STMT); - - q = create_tmp_reg (itype, "q"); - t = fold_build2 (TRUNC_DIV_EXPR, itype, n, nthreads); - t = force_gimple_operand_gsi (&gsi, t, false, NULL_TREE, true, GSI_SAME_STMT); - gsi_insert_before (&gsi, gimple_build_assign (q, t), GSI_SAME_STMT); - - tt = create_tmp_reg (itype, "tt"); - t = fold_build2 (TRUNC_MOD_EXPR, itype, n, nthreads); - t = force_gimple_operand_gsi (&gsi, t, false, NULL_TREE, true, GSI_SAME_STMT); - gsi_insert_before (&gsi, gimple_build_assign (tt, t), GSI_SAME_STMT); - - t = build2 (LT_EXPR, boolean_type_node, threadid, tt); - stmt = gimple_build_cond_empty (t); - gsi_insert_before (&gsi, stmt, GSI_SAME_STMT); + if (!region->kernelize) + { + t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1)); + t = fold_build2 (PLUS_EXPR, itype, step, t); + t = fold_build2 (PLUS_EXPR, itype, t, n2); + t = fold_build2 (MINUS_EXPR, itype, t, fold_convert (itype, n1)); + if (TYPE_UNSIGNED (itype) && fd->loop.cond_code == GT_EXPR) + t = fold_build2 (TRUNC_DIV_EXPR, itype, + fold_build1 (NEGATE_EXPR, itype, t), + fold_build1 (NEGATE_EXPR, itype, step)); + else + t = fold_build2 (TRUNC_DIV_EXPR, itype, t, step); + t = fold_convert (itype, t); + n = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, + GSI_SAME_STMT); + + q = create_tmp_reg (itype, "q"); + t = fold_build2 (TRUNC_DIV_EXPR, itype, n, nthreads); + t = force_gimple_operand_gsi (&gsi, t, false, NULL_TREE, true, + GSI_SAME_STMT); + gsi_insert_before (&gsi, gimple_build_assign (q, t), GSI_SAME_STMT); + + tt = create_tmp_reg (itype, "tt"); + t = fold_build2 (TRUNC_MOD_EXPR, itype, n, nthreads); + t = force_gimple_operand_gsi (&gsi, t, false, NULL_TREE, true, + GSI_SAME_STMT); + gsi_insert_before (&gsi, gimple_build_assign (tt, t), GSI_SAME_STMT); + + t = build2 (LT_EXPR, boolean_type_node, threadid, tt); + stmt = gimple_build_cond_empty (t); + gsi_insert_before (&gsi, stmt, GSI_SAME_STMT); - second_bb = split_block (entry_bb, stmt)->dest; - gsi = gsi_last_bb (second_bb); - gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_FOR); + second_bb = split_block (entry_bb, stmt)->dest; + gsi = gsi_last_bb (second_bb); + gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_FOR); - gsi_insert_before (&gsi, gimple_build_assign (tt, build_int_cst (itype, 0)), - GSI_SAME_STMT); - stmt = gimple_build_assign_with_ops (PLUS_EXPR, q, q, - build_int_cst (itype, 1)); - gsi_insert_before (&gsi, stmt, GSI_SAME_STMT); + gsi_insert_before (&gsi, gimple_build_assign (tt, + build_int_cst (itype, 0)), + GSI_SAME_STMT); + stmt = gimple_build_assign_with_ops (PLUS_EXPR, q, q, + build_int_cst (itype, 1)); + gsi_insert_before (&gsi, stmt, GSI_SAME_STMT); - third_bb = split_block (second_bb, stmt)->dest; - gsi = gsi_last_bb (third_bb); - gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_FOR); + third_bb = split_block (second_bb, stmt)->dest; + gsi = gsi_last_bb (third_bb); + gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_FOR); - t = build2 (MULT_EXPR, itype, q, threadid); - t = build2 (PLUS_EXPR, itype, t, tt); - s0 = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, GSI_SAME_STMT); + t = build2 (MULT_EXPR, itype, q, threadid); + t = build2 (PLUS_EXPR, itype, t, tt); + s0 = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, + GSI_SAME_STMT); - t = fold_build2 (PLUS_EXPR, itype, s0, q); - e0 = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, GSI_SAME_STMT); + t = fold_build2 (PLUS_EXPR, itype, s0, q); + e0 = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, + GSI_SAME_STMT); - t = build2 (GE_EXPR, boolean_type_node, s0, e0); - gsi_insert_before (&gsi, gimple_build_cond_empty (t), GSI_SAME_STMT); + t = build2 (GE_EXPR, boolean_type_node, s0, e0); + gsi_insert_before (&gsi, gimple_build_cond_empty (t), GSI_SAME_STMT); + } /* Remove the GIMPLE_OMP_FOR statement. */ gsi_remove (&gsi, true); @@ -6174,7 +6285,7 @@ expand_omp_for_static_nochunk (struct omp_region *region, gcc_assert (innerc); endvar = OMP_CLAUSE_DECL (innerc); } - t = fold_convert (itype, s0); + t = fold_convert (itype, region->kernelize ? threadid : s0); t = fold_build2 (MULT_EXPR, itype, t, step); if (POINTER_TYPE_P (type)) t = fold_build_pointer_plus (n1, t); @@ -6188,25 +6299,28 @@ expand_omp_for_static_nochunk (struct omp_region *region, stmt = gimple_build_assign (startvar, t); gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING); - t = fold_convert (itype, e0); - t = fold_build2 (MULT_EXPR, itype, t, step); - if (POINTER_TYPE_P (type)) - t = fold_build_pointer_plus (n1, t); - else - t = fold_build2 (PLUS_EXPR, type, t, n1); - t = fold_convert (TREE_TYPE (startvar), t); - e = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, - false, GSI_CONTINUE_LINKING); - if (endvar) + if (!region->kernelize) { - stmt = gimple_build_assign (endvar, e); - gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING); - if (useless_type_conversion_p (TREE_TYPE (fd->loop.v), TREE_TYPE (e))) - stmt = gimple_build_assign (fd->loop.v, e); + t = fold_convert (itype, e0); + t = fold_build2 (MULT_EXPR, itype, t, step); + if (POINTER_TYPE_P (type)) + t = fold_build_pointer_plus (n1, t); else - stmt = gimple_build_assign_with_ops (NOP_EXPR, fd->loop.v, e, - NULL_TREE); - gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING); + t = fold_build2 (PLUS_EXPR, type, t, n1); + t = fold_convert (TREE_TYPE (startvar), t); + e = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, + false, GSI_CONTINUE_LINKING); + if (endvar) + { + stmt = gimple_build_assign (endvar, e); + gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING); + if (useless_type_conversion_p (TREE_TYPE (fd->loop.v), TREE_TYPE (e))) + stmt = gimple_build_assign (fd->loop.v, e); + else + stmt = gimple_build_assign_with_ops (NOP_EXPR, fd->loop.v, e, + NULL_TREE); + gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING); + } } if (fd->collapse > 1) expand_omp_for_init_vars (fd, &gsi, counts, inner_stmt, startvar); @@ -6218,28 +6332,32 @@ expand_omp_for_static_nochunk (struct omp_region *region, gsi = gsi_last_bb (cont_bb); stmt = gsi_stmt (gsi); gcc_assert (gimple_code (stmt) == GIMPLE_OMP_CONTINUE); - vmain = gimple_omp_continue_control_use (stmt); - vback = gimple_omp_continue_control_def (stmt); - if (!gimple_omp_for_combined_p (fd->for_stmt)) + if (!region->kernelize) { - if (POINTER_TYPE_P (type)) - t = fold_build_pointer_plus (vmain, step); - else - t = fold_build2 (PLUS_EXPR, type, vmain, step); - t = force_gimple_operand_gsi (&gsi, t, - DECL_P (vback) - && TREE_ADDRESSABLE (vback), - NULL_TREE, true, GSI_SAME_STMT); - stmt = gimple_build_assign (vback, t); - gsi_insert_before (&gsi, stmt, GSI_SAME_STMT); + vmain = gimple_omp_continue_control_use (stmt); + vback = gimple_omp_continue_control_def (stmt); - t = build2 (fd->loop.cond_code, boolean_type_node, - DECL_P (vback) && TREE_ADDRESSABLE (vback) - ? t : vback, e); - gsi_insert_before (&gsi, gimple_build_cond_empty (t), GSI_SAME_STMT); + if (!gimple_omp_for_combined_p (fd->for_stmt)) + { + if (POINTER_TYPE_P (type)) + t = fold_build_pointer_plus (vmain, step); + else + t = fold_build2 (PLUS_EXPR, type, vmain, step); + t = force_gimple_operand_gsi (&gsi, t, + DECL_P (vback) + && TREE_ADDRESSABLE (vback), + NULL_TREE, true, GSI_SAME_STMT); + stmt = gimple_build_assign (vback, t); + gsi_insert_before (&gsi, stmt, GSI_SAME_STMT); + + t = build2 (fd->loop.cond_code, boolean_type_node, + DECL_P (vback) && TREE_ADDRESSABLE (vback) + ? t : vback, e); + gsi_insert_before (&gsi, gimple_build_cond_empty (t), + GSI_SAME_STMT); + } } - /* Remove the GIMPLE_OMP_CONTINUE statement. */ gsi_remove (&gsi, true); @@ -6257,18 +6375,27 @@ expand_omp_for_static_nochunk (struct omp_region *region, gsi_remove (&gsi, true); /* Connect all the blocks. */ - ep = make_edge (entry_bb, third_bb, EDGE_FALSE_VALUE); - ep->probability = REG_BR_PROB_BASE / 4 * 3; - ep = find_edge (entry_bb, second_bb); - ep->flags = EDGE_TRUE_VALUE; - ep->probability = REG_BR_PROB_BASE / 4; - find_edge (third_bb, seq_start_bb)->flags = EDGE_FALSE_VALUE; - find_edge (third_bb, fin_bb)->flags = EDGE_TRUE_VALUE; + if (!region->kernelize) + { + ep = make_edge (entry_bb, third_bb, EDGE_FALSE_VALUE); + ep->probability = REG_BR_PROB_BASE / 4 * 3; + ep = find_edge (entry_bb, second_bb); + ep->flags = EDGE_TRUE_VALUE; + ep->probability = REG_BR_PROB_BASE / 4; + find_edge (third_bb, seq_start_bb)->flags = EDGE_FALSE_VALUE; + find_edge (third_bb, fin_bb)->flags = EDGE_TRUE_VALUE; + } + else + { + remove_edge (find_edge (entry_bb, fin_bb)); + find_edge (entry_bb, seq_start_bb)->flags = EDGE_FALLTHRU; + } if (!broken_loop) { ep = find_edge (cont_bb, body_bb); - if (gimple_omp_for_combined_p (fd->for_stmt)) + if (gimple_omp_for_combined_p (fd->for_stmt) + || region->kernelize) { remove_edge (ep); ep = NULL; @@ -6284,16 +6411,23 @@ expand_omp_for_static_nochunk (struct omp_region *region, = ep ? EDGE_FALSE_VALUE : EDGE_FALLTHRU; } - set_immediate_dominator (CDI_DOMINATORS, second_bb, entry_bb); - set_immediate_dominator (CDI_DOMINATORS, third_bb, entry_bb); - set_immediate_dominator (CDI_DOMINATORS, seq_start_bb, third_bb); + if (!region->kernelize) + { + set_immediate_dominator (CDI_DOMINATORS, second_bb, entry_bb); + set_immediate_dominator (CDI_DOMINATORS, third_bb, entry_bb); + set_immediate_dominator (CDI_DOMINATORS, seq_start_bb, third_bb); + } + else + set_immediate_dominator (CDI_DOMINATORS, seq_start_bb, entry_bb); set_immediate_dominator (CDI_DOMINATORS, body_bb, recompute_dominator (CDI_DOMINATORS, body_bb)); set_immediate_dominator (CDI_DOMINATORS, fin_bb, recompute_dominator (CDI_DOMINATORS, fin_bb)); - if (!broken_loop && !gimple_omp_for_combined_p (fd->for_stmt)) + if (!broken_loop + && !region->kernelize + && !gimple_omp_for_combined_p (fd->for_stmt)) { struct loop *loop = alloc_loop (); loop->header = body_bb; @@ -7278,6 +7412,15 @@ expand_omp_for (struct omp_region *region, gimple inner_stmt) extract_omp_for_data (last_stmt (region->entry), &fd, loops); region->sched_kind = fd.sched_kind; + if (region->kernelize) + { + gimple_statement_omp_for *omp_for_stmt = + as_a <gimple_statement_omp_for *> (last_stmt (region->entry)); + region->orig_first_iter = omp_for_stmt->orig_first_iter; + } + else + region->orig_first_iter = NULL; + gcc_assert (EDGE_COUNT (region->entry->succs) == 2); BRANCH_EDGE (region->entry)->flags &= ~EDGE_ABNORMAL; FALLTHRU_EDGE (region->entry)->flags &= ~EDGE_ABNORMAL; @@ -7300,7 +7443,8 @@ expand_omp_for (struct omp_region *region, gimple inner_stmt) else if (fd.sched_kind == OMP_CLAUSE_SCHEDULE_STATIC && !fd.have_ordered) { - if (fd.chunk_size == NULL) + region->req_group_size = fd.chunk_size; + if (fd.chunk_size == NULL || region->kernelize) expand_omp_for_static_nochunk (region, &fd, inner_stmt); else expand_omp_for_static_chunk (region, &fd, inner_stmt); diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 8259dba..9e838a9 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -4933,7 +4933,7 @@ has_caller_p (struct cgraph_node *node, void *data ATTRIBUTE_UNUSED) static bool ipa_sra_preliminary_function_checks (struct cgraph_node *node) { - if (!node->can_be_local_p ()) + if (TREE_USED (node->decl) || !node->can_be_local_p ()) { if (dump_file) fprintf (dump_file, "Function not local to this compilation unit.\n"); -- 1.8.4.5