Many thanks. Here is the new patch that fixes the main problem of the previous one (i.e separation of the loop after unroll and jam) as well as the problems raised by you (see comments below).
Now the code with the separation class option looks: ISL AST generated by ISL: { for (int c0 = 0; c0 < HEIGHT - 4; c0 += 4) for (int c1 = 0; c1 < LENGTH - 3; c1 += 1) for (int c2 = c0; c2 <= c0 + 3; c2 += 1) S_4(c2, c1); for (int c1 = 0; c1 < LENGTH - 3; c1 += 1) for (int c2 = -((HEIGHT - 1) % 4) + HEIGHT - 1; c2 < HEIGHT; c2 += 1) S_4(c2, c1); } I tried the "unroll" option for AST, two loops are unrolled and the code looks like: ISL AST generated by ISL: { for (int c0 = 0; c0 < HEIGHT - 4; c0 += 4) for (int c1 = 0; c1 < LENGTH - 3; c1 += 1) { S_4(c0, c1); S_4(c0 + 1, c1); S_4(c0 + 2, c1); S_4(c0 + 3, c1); } for (int c1 = 0; c1 < LENGTH - 3; c1 += 1) { S_4(-((HEIGHT - 1) % 4) + HEIGHT - 1, c1); if ((HEIGHT - 1) % 4 >= 1) { S_4(-((HEIGHT - 1) % 4) + HEIGHT, c1); if ((HEIGHT - 1) % 4 >= 2) { S_4(-((HEIGHT - 1) % 4) + HEIGHT + 1, c1); if (HEIGHT % 4 == 0) S_4(HEIGHT - 1, c1); } } } } As I don't quite like the unrolling of the second loop, and the GCC standard unrolling is able to unroll the first one, decided to not perfrom the unrolling within graphite. But if desirable, it could be done. The patch remains basically the same, two maps are build, one for the regular unroll and jam (i.e. stride mining) and the other for computing the separating class (i.e. its image is the image of the full tiles on strided dimension). In graphite-isl-ast-to-gimple.c, these two maps are used to build the separation class option and fix the scheduling. The main differences from the previous path is that the option separating class is set on a different dimension and a contraint was added to to the map used to build the separating_class. Now some comments to your message: > > I'm not sure if Tobi or Albert have told you, but the separation_class option > is going to be phased out since its design is fundamentally flawed. > If you can't wait until isl-0.15, then I guess you have no choice but > to use this option, but you should realize that it will remain frozen > in its current broken state (until it is removed at some point). > No, didn't know about the phase out of separation_class option. Anyway, for the time being is the best solution available. My understanding is that this option should always generate correct code, of course as long as the scheduling is correct, but think that had some cases when setting the separating_class leads to incorrect code. For isl_0.15, do you intend to provide some option with a similar functionality ? > > + /* Extract the original and auxiliar maps from pbb->transformed. > > + Set pbb->transformed to the original map. */ > > + psmap = &smap; > > + psmap->n = 0; > > + res = isl_map_foreach_basic_map (pbb->transformed, separate_map, > > (void *)psmap); > > + gcc_assert (res == 0); > > + > > + isl_map_free(pbb->transformed); > > + pbb->transformed = isl_map_copy(psmap->map_arr[0]); > > + > > I have no idea what this pbb->transformed is supposed to represent, > but you appear to be assuming that it has exactly two disjuncts and that > they appear in some order. Now, perhaps you have explicitly > checked that this map has two disjuncts, but then you should > probably bring the check closer since any operation on sets that > you perform could change the internal representation (even of > other sets). However, in no way can you assume that > isl_map_foreach_basic_map will iterate over these disjuncts > in any specific order. > At this point pbb->transformed has two basic maps, one is the mapping for unroll and jam, and one for the full tile for the striped dimension. Introduce a check that differentiate between them as the image of one maps should be included in the other. In fact to prevent any isl side-effects, thought to introduce a new field pbb->transformed_full in the pbb structure to be on the safe side.
Index: gcc/toplev.c =================================================================== --- gcc/toplev.c (revision 217013) +++ gcc/toplev.c (working copy) @@ -1302,11 +1302,12 @@ || flag_loop_block || flag_loop_interchange || flag_loop_strip_mine - || flag_loop_parallelize_all) + || flag_loop_parallelize_all + || flag_loop_unroll_jam) sorry ("Graphite loop optimizations cannot be used (ISL is not available)" "(-fgraphite, -fgraphite-identity, -floop-block, " "-floop-interchange, -floop-strip-mine, -floop-parallelize-all, " - "and -ftree-loop-linear)"); + "-floop-unroll-and-jam, and -ftree-loop-linear)"); #endif /* One region RA really helps to decrease the code size. */ Index: gcc/graphite-optimize-isl.c =================================================================== --- gcc/graphite-optimize-isl.c (revision 217013) +++ gcc/graphite-optimize-isl.c (working copy) @@ -186,7 +186,7 @@ PartialSchedule = isl_band_get_partial_schedule (Band); *Dimensions = isl_band_n_member (Band); - if (DisableTiling) + if (DisableTiling || flag_loop_unroll_jam) return PartialSchedule; /* It does not make any sense to tile a band with just one dimension. */ @@ -305,8 +305,89 @@ isl_constraint_set_constant_si (c, VectorWidth - 1); TilingMap = isl_map_add_constraint (TilingMap, c); - isl_map_dump (TilingMap); + return TilingMap; +} +/* Create an auxiliary map to getPrevectorMap whose image provides the + separation class for full/partial tile separation of the strip-mined + dimension. It is used in graphite_isl_ast_to_gimple.c to set the + corresponding option for AST build. */ +static isl_map * +getPrevectorMap_full (isl_ctx *ctx, int DimToVectorize, + int ScheduleDimensions, + int VectorWidth) +{ + isl_space *Space; + isl_local_space *LocalSpace, *LocalSpaceRange; + isl_set *Modulo; + isl_map *TilingMap; + isl_constraint *c; + isl_aff *Aff; + int PointDimension; /* ip */ + int TileDimension; /* it */ + isl_val *VectorWidthMP; + int i; + + /* assert (0 <= DimToVectorize && DimToVectorize < ScheduleDimensions);*/ + + Space = isl_space_alloc (ctx, 0, ScheduleDimensions, ScheduleDimensions + 1); + TilingMap = isl_map_universe (isl_space_copy (Space)); + LocalSpace = isl_local_space_from_space (Space); + PointDimension = ScheduleDimensions; + TileDimension = DimToVectorize; + + /* Create an identity map for everything except DimToVectorize and the + point loop. */ + for (i = 0; i < ScheduleDimensions; i++) + { + if (i == DimToVectorize) + continue; + + c = isl_equality_alloc (isl_local_space_copy (LocalSpace)); + + isl_constraint_set_coefficient_si (c, isl_dim_in, i, -1); + isl_constraint_set_coefficient_si (c, isl_dim_out, i, 1); + + TilingMap = isl_map_add_constraint (TilingMap, c); + } + + /* it % 'VectorWidth' = 0 */ + LocalSpaceRange = isl_local_space_range (isl_local_space_copy (LocalSpace)); + Aff = isl_aff_zero_on_domain (LocalSpaceRange); + Aff = isl_aff_set_constant_si (Aff, VectorWidth); + Aff = isl_aff_set_coefficient_si (Aff, isl_dim_in, TileDimension, 1); + + VectorWidthMP = isl_val_int_from_si (ctx, VectorWidth); + Aff = isl_aff_mod_val (Aff, VectorWidthMP); + Modulo = isl_pw_aff_zero_set (isl_pw_aff_from_aff (Aff)); + TilingMap = isl_map_intersect_range (TilingMap, Modulo); + + /* it + ('VectorWidth' - 1) = i0 */ + c = isl_equality_alloc (isl_local_space_copy(LocalSpace)); + isl_constraint_set_coefficient_si (c, isl_dim_out, TileDimension,-1); + isl_constraint_set_coefficient_si (c, isl_dim_in, TileDimension, 1); + isl_constraint_set_constant_si (c, -VectorWidth); + TilingMap = isl_map_add_constraint (TilingMap, c); + + /* ip >= 0 */ + c = isl_inequality_alloc (isl_local_space_copy (LocalSpace)); + isl_constraint_set_coefficient_si (c, isl_dim_out, PointDimension, 1); + isl_constraint_set_constant_si (c, 0); + TilingMap = isl_map_add_constraint (TilingMap, c); + + /* it <= ip */ + c = isl_inequality_alloc (isl_local_space_copy (LocalSpace)); + isl_constraint_set_coefficient_si (c, isl_dim_out, TileDimension, -1); + isl_constraint_set_coefficient_si (c, isl_dim_out, PointDimension, 1); + TilingMap = isl_map_add_constraint (TilingMap, c); + + /* ip <= it + ('VectorWidth' - 1) */ + c = isl_inequality_alloc (LocalSpace); + isl_constraint_set_coefficient_si (c, isl_dim_out, TileDimension, 1); + isl_constraint_set_coefficient_si (c, isl_dim_out, PointDimension, -1); + isl_constraint_set_constant_si (c, VectorWidth - 1); + TilingMap = isl_map_add_constraint (TilingMap, c); + return TilingMap; } @@ -350,17 +431,34 @@ SuffixSchedule); isl_band_list_free (Children); } - else if (EnablePollyVector) + else if (EnablePollyVector || flag_loop_unroll_jam) { + int i; + int depth; + + depth = PARAM_VALUE (PARAM_LOOP_UNROLL_JAM_DEPTH); + for (i = ScheduleDimensions - 1 ; i >= 0 ; i--) { + if (flag_loop_unroll_jam && (i != (ScheduleDimensions - depth))) + continue; + if (isl_band_member_is_zero_distance (Band, i)) { isl_map *TileMap; isl_union_map *TileUMap; + int stride; - TileMap = getPrevectorMap (ctx, i, ScheduleDimensions, 4); + stride = PARAM_VALUE (PARAM_LOOP_UNROLL_JAM_SIZE); + + TileMap = getPrevectorMap (ctx, i, ScheduleDimensions, stride); + /* Add the map for the full tile on the strided dimension. + It will be used to build the separation class for + loop unroll and jam. */ TileUMap = isl_union_map_from_map (TileMap); + TileMap = getPrevectorMap_full (ctx, i, ScheduleDimensions, stride); + TileUMap = isl_union_map_add_map (TileUMap, TileMap); + TileUMap = isl_union_map_align_params (TileUMap, isl_space_copy (Space)); PartialSchedule = isl_union_map_apply_range Index: gcc/graphite.c =================================================================== --- gcc/graphite.c (revision 217013) +++ gcc/graphite.c (working copy) @@ -383,7 +383,8 @@ || flag_loop_strip_mine || flag_graphite_identity || flag_loop_parallelize_all - || flag_loop_optimize_isl) + || flag_loop_optimize_isl + || flag_loop_unroll_jam) flag_graphite = 1; return flag_graphite != 0; Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 217013) +++ gcc/common.opt (working copy) @@ -1328,6 +1328,10 @@ Common Report Var(flag_loop_block) Optimization Enable Loop Blocking transformation +floop-unroll-and-jam +Common Report Var(flag_loop_unroll_jam) Optimization +Enable Loop Unroll Jam transformation + fgnu-tm Common Report Var(flag_tm) Enable support for GNU transactional memory Index: gcc/graphite-isl-ast-to-gimple.c =================================================================== --- gcc/graphite-isl-ast-to-gimple.c (revision 217013) +++ gcc/graphite-isl-ast-to-gimple.c (working copy) @@ -831,6 +831,152 @@ return schedule; } +/* Auxiliary data structure to separate the maps. */ + +struct sep_map +{ + isl_map *map_arr[2]; + int n; +}; + +/* Auxiliary function used to separate the maps. */ + +static int +separate_map (isl_basic_map *bmap, void *user) +{ + struct sep_map *smap = (struct sep_map *) user; + + smap->map_arr[smap->n] = isl_map_from_basic_map(bmap); + smap->n += 1; + + return 0; +} + +/* Set the separation_class option for unroll and jam. */ + +static __isl_give isl_union_map * +generate_luj_sepclass_opt (scop_p scop, __isl_take isl_union_set *domain, + int dim, int cl) +{ + isl_map *map; + isl_space *space, *space_sep; + isl_ctx *ctx; + isl_union_map *mapu; + int nsched = get_max_schedule_dimensions (scop); + + ctx = scop->ctx; + space_sep = isl_space_alloc (ctx, 0, 1, 1); + space_sep = isl_space_wrap (space_sep); + space_sep = isl_space_set_tuple_name (space_sep, isl_dim_set, + "separation_class"); + space = isl_set_get_space (scop->context); + space_sep = isl_space_align_params (space_sep, isl_space_copy(space)); + space = isl_space_map_from_domain_and_range (space, space_sep); + space = isl_space_add_dims (space,isl_dim_in, nsched); + map = isl_map_universe (space); + isl_map_fix_si (map,isl_dim_out,0,dim); + isl_map_fix_si (map,isl_dim_out,1,cl); + + mapu = isl_union_map_intersect_domain (isl_union_map_from_map (map), + domain); + return (mapu); +} + +/* Compute the separation class for loop unroll and jam. */ + +static __isl_give isl_union_set * +generate_luj_sepclass (scop_p scop) +{ + int i; + poly_bb_p pbb; + isl_union_set *domain_isl; + + domain_isl = isl_union_set_empty (isl_set_get_space (scop->context)); + + FOR_EACH_VEC_ELT (SCOP_BBS (scop), i, pbb) + { + isl_set *bb_domain; + isl_set *bb_domain_s; + isl_set *bb_domain_r; + int res; + + struct sep_map smap, *psmap; + + /* Dead code elimination: when the domain of a PBB is empty, + don't generate code for the PBB. */ + if (isl_set_is_empty (pbb->domain)) + continue; + + /* Extract the original and auxiliar maps from pbb->transformed. + Set pbb->transformed to the original map. */ + psmap = &smap; + psmap->n = 0; + res = isl_map_foreach_basic_map (pbb->transformed, separate_map, + (void *)psmap); + gcc_assert (res == 0); + + /* There is a single basic map. */ + if (psmap->n < 2) + { + isl_map_free(psmap->map_arr[0]); + continue; + } + + bb_domain = isl_set_copy (pbb->domain); + bb_domain_s = isl_set_apply (isl_set_copy (bb_domain), + isl_map_copy(psmap->map_arr[1])); + bb_domain_r = isl_set_apply (bb_domain, isl_map_copy(psmap->map_arr[0])); + + isl_map_free(pbb->transformed); + if (isl_set_is_subset (bb_domain_s,bb_domain_r)) + { + pbb->transformed = isl_map_copy (psmap->map_arr[0]); + domain_isl = + isl_union_set_union (domain_isl, + isl_union_set_from_set (bb_domain_s)); + isl_set_free (bb_domain_r); + } + else if (isl_set_is_subset (bb_domain_r,bb_domain_s)) + { + pbb->transformed = isl_map_copy (psmap->map_arr[1]); + domain_isl = + isl_union_set_union (domain_isl, + isl_union_set_from_set (bb_domain_r)); + isl_set_free (bb_domain_s); + } + else + gcc_assert(0); + + isl_map_free(psmap->map_arr[0]); + isl_map_free(psmap->map_arr[1]); + } + + return domain_isl; +} + +/* Set the AST built options for loop unroll and jam. */ + +static __isl_give isl_union_map * +generate_luj_options (scop_p scop) +{ + isl_union_set *domain_isl; + isl_union_map *options_isl_ss; + isl_union_map *options_isl = + isl_union_map_empty (isl_set_get_space (scop->context)); + int dim = get_max_schedule_dimensions (scop) - 1; + int dim1 = dim - PARAM_VALUE (PARAM_LOOP_UNROLL_JAM_DEPTH); + + if (!flag_loop_unroll_jam) + return options_isl; + + domain_isl = generate_luj_sepclass (scop); + + options_isl_ss = generate_luj_sepclass_opt (scop, domain_isl, dim1, 0); + options_isl = isl_union_map_union (options_isl, options_isl_ss); + + return options_isl; +} + /* Generates a schedule, which specifies an order used to visit elements in a domain. */ @@ -879,11 +1025,13 @@ } /* Set the separate option for all dimensions. - This helps to reduce control overhead. */ + This helps to reduce control overhead. + Set the options for unroll and jam. */ static __isl_give isl_ast_build * set_options (__isl_take isl_ast_build *control, - __isl_keep isl_union_map *schedule) + __isl_keep isl_union_map *schedule, + __isl_take isl_union_map *opt_luj) { isl_ctx *ctx = isl_union_map_get_ctx (schedule); isl_space *range_space = isl_space_set_alloc (ctx, 0, 1); @@ -894,6 +1042,9 @@ isl_union_set *domain = isl_union_map_range (isl_union_map_copy (schedule)); domain = isl_union_set_universe (domain); isl_union_map *options = isl_union_map_from_domain_and_range (domain, range); + + options = isl_union_map_union (options, opt_luj); + return isl_ast_build_set_options (control, options); } @@ -907,9 +1058,14 @@ isl_options_set_ast_build_atomic_upper_bound (scop->ctx, true); add_parameters_to_ivs_params (scop, ip); + + isl_union_map *options_luj = generate_luj_options (scop); + isl_union_map *schedule_isl = generate_isl_schedule (scop); isl_ast_build *context_isl = generate_isl_context (scop); - context_isl = set_options (context_isl, schedule_isl); + + context_isl = set_options (context_isl, schedule_isl, options_luj); + isl_union_map *dependences = NULL; if (flag_loop_parallelize_all) { Index: gcc/graphite-poly.c =================================================================== --- gcc/graphite-poly.c (revision 217013) +++ gcc/graphite-poly.c (working copy) @@ -276,7 +276,7 @@ /* This pass needs to be run at the final stage, as it does not update the lst. */ - if (flag_loop_optimize_isl) + if (flag_loop_optimize_isl || flag_loop_unroll_jam) transform_done |= optimize_isl (scop); return transform_done; Index: gcc/params.def =================================================================== --- gcc/params.def (revision 217013) +++ gcc/params.def (working copy) @@ -847,6 +847,21 @@ "size of tiles for loop blocking", 51, 0, 0) +/* Size of unrolling factor for unroll-and-jam. */ + +DEFPARAM (PARAM_LOOP_UNROLL_JAM_SIZE, + "loop-unroll-jam-size", + "size of unrolling factor for unroll-and-jam", + 4, 0, 0) + +/* Size of the band formed by the strip mined dimension and the most inner one for unroll-and-jam. */ + +DEFPARAM (PARAM_LOOP_UNROLL_JAM_DEPTH, + "loop-unroll-jam-depth", + "depth of unrolled loop for unroll-and-jam", + 2, 0, 0) + + /* Maximal number of parameters that we allow in a SCoP. */ DEFPARAM (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS,
2014-11-11 Mircea Namolaru <mircea.namol...@inria.fr> * common.opt (flag_loop_unroll_and_jam): New flag for unroll and jam. * params.def (PARAM_LOOP_UNROLL_JAM_SIZE) (PARAM_LOOP_UNROLL_JAM_DEPTH): Parameters for unroll and jam flag. * graphite-isl-ast-to-gimple.c (struct_sep_map): New structure. * graphite-isl-ast-to-gimple.c (separate_map,generate_luj_sepclass_opt, generate_luj_sepclass,generate_luj_options): New functions to set AST options for unroll and jam. * graphite-isl-ast-to-gimple.c (set_options,scop_to_isl_ast): Added support for unroll and jam options. * graphite-optimize-isl.c (getPrevectorMap_full): New function for unroll and jam. * graphite-optimize-isl.c (getScheduleForBand,getScheduleForBandList): Added support for unroll and jam. * graphite.c: Support for unroll and jam flag. * graphite-poly.c: Same. * toplev.c: Same.