> -----Original Message-----
> From: Gcc <gcc-bounces~prathameshk=nvidia....@gcc.gnu.org> On Behalf
> Of Prathamesh Kulkarni via Gcc
> Sent: 02 December 2024 16:47
> To: Jakub Jelinek <ja...@redhat.com>
> Cc: Andrew Stubbs <a...@baylibre.com>; Richard Biener
> <richard.guent...@gmail.com>; Richard Biener <rguent...@suse.de>;
> gcc@gcc.gnu.org; Thomas Schwinge <tschwi...@baylibre.com>
> Subject: RE: [RFC] Enabling SVE with offloading to nvptx
>
> External email: Use caution opening links or attachments
>
>
> > -----Original Message-----
> > From: Jakub Jelinek <ja...@redhat.com>
> > Sent: 28 November 2024 17:39
> > To: Prathamesh Kulkarni <prathame...@nvidia.com>
> > Cc: Andrew Stubbs <a...@baylibre.com>; Richard Biener
> > <richard.guent...@gmail.com>; Richard Biener <rguent...@suse.de>;
> > gcc@gcc.gnu.org; Thomas Schwinge <tschwi...@baylibre.com>
> > Subject: Re: [RFC] Enabling SVE with offloading to nvptx
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Thu, Nov 14, 2024 at 08:29:26AM +0000, Prathamesh Kulkarni wrote:
> > > + /* True if SIMD loop needs delayed lowering of artefacts like
> > > + safelen and length of omp simd arrays that depend on
> target's
> > > + max_vf. This is true for offloading, when max_vf is computed
> > > + after
> >
> > 2 spaces after ., not just one.
> Fixed in the attached patch.
> >
> > > + streaming out to device. */
> > > + unsigned needs_max_vf_lowering: 1;
> > > +
> > > /* The number of times to unroll the loop. 0 means no
> > information given,
> > > just do what we always do. A value of 1 means do not unroll
> > the loop.
> > > A value of USHRT_MAX means unroll with no specific unrolling
> > factor.
> > > diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc index
> > > 80fb1843445..c9581e3d92e 100644
> > > --- a/gcc/omp-expand.cc
> > > +++ b/gcc/omp-expand.cc
> > > @@ -7171,6 +7171,10 @@ expand_omp_simd (struct omp_region *region,
> > struct omp_for_data *fd)
> > > loop->latch = cont_bb;
> > > add_loop (loop, l1_bb->loop_father);
> > > loop->safelen = safelen_int;
> > > + loop->needs_max_vf_lowering = is_in_offload_region
> (region);
> > > + if (loop->needs_max_vf_lowering)
> > > + cfun->curr_properties &= ~PROP_gimple_lomp_dev;
> > > +
> > > if (simduid)
> > > {
> > > loop->simduid = OMP_CLAUSE__SIMDUID__DECL (simduid); diff
> > > --git a/gcc/omp-low.cc b/gcc/omp-low.cc index
> > 70a2c108fbc..0f68876a2bc
> > > 100644
> > > --- a/gcc/omp-low.cc
> > > +++ b/gcc/omp-low.cc
> > > @@ -4660,7 +4660,16 @@ lower_rec_simd_input_clauses (tree new_var,
> > omp_context *ctx,
> > > }
> > > else
> > > {
> > > - tree atype = build_array_type_nelts (TREE_TYPE (new_var),
> > sctx->max_vf);
> > > + poly_uint64 nelts;
> > > + /* FIXME: If offloading is enabled, and max_vf is poly_int,
> > avoid
> > > + using it as length of omp simd array, and use a placeholder
> > value
> > > + instead to avoid streaming out poly_int to device. The
> arrays
> > > + will be set to correct length later in omp_device_lower
> pass.
> > > + */
> >
> > Likewise. I really don't like introducing FIXME comments. Describe
> > what you want to say, but not with FIXME.
> Fixed.
> >
> > > + if (omp_maybe_offloaded_ctx (ctx) && !sctx-
> > >max_vf.is_constant ())
> > > + nelts = INT_MAX;
> >
> > I'm not convinced INT_MAX is the right placeholder.
> > On 32-bit arches whenever TREE_TYPE (new_var) is 2 or more bytes the
> > array won't fit into address space anymore.
> > Use some magic value like 64 and say in the comment it really
> doesn't
> > matter much, as we'll change it later.
> > 1 probably wouldn't be best, then I think some undesirable
> > optimizations couild trigger (e.g. ignoring the exact value of the
> > index).
> Adjusted to use 64 in the patch.
> >
> > > + else
> > > + nelts = sctx->max_vf;
> > > + tree atype = build_array_type_nelts (TREE_TYPE (new_var),
> > > + nelts);
> > > tree avar = create_tmp_var_raw (atype);
> > > if (TREE_ADDRESSABLE (new_var))
> > > TREE_ADDRESSABLE (avar) = 1;
> > > diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc index
> > > 372b019f9d6..9d6467cc6a6 100644
> > > --- a/gcc/omp-offload.cc
> > > +++ b/gcc/omp-offload.cc
> > > @@ -2618,6 +2618,75 @@ find_simtpriv_var_op (tree *tp, int
> > *walk_subtrees, void *)
> > > return NULL_TREE;
> > > }
> > >
> > > +/* Compute max_vf for target, and accordingly set loop->safelen
> and
> > length
> > > + of omp simd arrays. */
> > > +
> > > +static void
> > > +adjust_max_vf (function *fun)
> > > +{
> > > + if (!fun->has_simduid_loops)
> > > + return;
> > > +
> > > + poly_uint64 max_vf = omp_max_vf (false);
> > > +
> > > + /* FIXME: Since loop->safelen has to be an integer, it's not
> > always possible
> > > + to compare against poly_int. For eg 32 and 16+16x are not
> > comparable at
> > > + compile-time because 16+16x <= 32 for x < 2, but 16+16x > 32
> > for x >= 2.
> > > + Even if we could get runtime VL based on -mcpu/-march, that
> > would not be
> > > + portable across other SVE archs.
> > > +
> > > + For now, use constant_lower_bound (max_vf), as a "safer
> > approximation" to
> > > + max_vf that avoids these issues, with the downside that it
> > will be
> > > + suboptimal max_vf for SVE archs implementing SIMD width >
> 128
> > > + bits. */
> >
> > Again, I don't like FIXMEs and . should be followed by 2 spaces.
> Fixed.
> >
> > > + tree assign_stmt_lhs = gimple_assign_lhs
> > (assign_stmt);
> > > + if (TREE_CODE (assign_stmt_lhs) == ARRAY_REF)
> > > + {
> > > + tree array = TREE_OPERAND (assign_stmt_lhs,
> > 0);
> > > + tree& max = TYPE_MAX_VALUE (TYPE_DOMAIN
> > (TREE_TYPE (array)));
> > > + max = size_int (loop->safelen - 1);
> > > + relayout_decl (array);
> >
> > This is definitely wrong, please don't do that.
> > ARRAY_TYPEs are shared, see build_array_type_1, so by forcefully
> > changing the type in place, you don't change just one particular
> array
> > (e.g. if it is int[INT_MAX]), but all user arrays as well.
> > You need to create a new array type instead, using
> > built_array_type_nelts.
> > See what shrink_simd_arrays does.
> Fixed, thanks.
> >
> > Why do you need that calculate_dominance_info (CDI_DOMINATORS); ?
> Because get_loop_body needs dominated_by_p.
> For eg without calculating dominance info, it results in following ICE
> for libgomp.c/pr81778.c:
>
> 0x2415d63 internal_error(char const*, ...)
> ../../gcc/gcc/diagnostic-global-context.cc:517
> 0x7dd1ab fancy_abort(char const*, int, char const*)
> ../../gcc/gcc/diagnostic.cc:1696
> 0xa7a877 dominated_by_p(cdi_direction, basic_block_def const*,
> basic_block_def const*)
> ../../gcc/gcc/dominance.cc:1130
> 0xa7a877 dominated_by_p(cdi_direction, basic_block_def const*,
> basic_block_def const*)
> ../../gcc/gcc/dominance.cc:1125
> 0x9d9227 dfs_enumerate_from(basic_block_def*, int, bool
> (*)(basic_block_def const*, void const*), basic_block_def**, int, void
> const*)
> ../../gcc/gcc/cfganal.cc:1588
> 0x9f7ac7 get_loop_body_with_size(loop const*, basic_block_def**,
> unsigned int)
> ../../gcc/gcc/cfgloop.cc:872
> 0x9f7ac7 get_loop_body(loop const*)
> ../../gcc/gcc/cfgloop.cc:901
> 0xe197fb adjust_max_vf
> ../../gcc/gcc/omp-offload.cc:2655
>
> > I wonder if it wouldn't be better to reuse tree-vectorizer.cc
> > infrastructure here, note_simd_array_uses in particular, perhaps add
> a
> > helper function in tree-vectorizer.cc for that.
> AFAIU, note_simd_array_uses builds a map from omp simd array ->
> simduid used for indexing it, but we need map from omp simd array ->
> SIMD loop it's used in (for setting length to loop->safelen), and I am
> not sure if note_simd_array_uses would help with that ? IIUC,
> shrink_simd_arrays also seems to use a separate map simduid_to_vf, for
> setting length of omp simd array to desired vf. In the patch, I have
> left it as-is to "collect" omp simd arrays by walking over their uses
> in the SIMD loop, and set length to loop->safelen.
> Does that look OK ?
> >
> > > @@ -2755,7 +2826,10 @@ execute_omp_device_lower ()
> > > rhs = build_int_cst (type, vf);
> > > break;
> > > case IFN_GOMP_MAX_VF:
> > > - rhs = build_int_cst (type, omp_max_vf (false));
> > > + /* Use constant_lower_bound, to keep max_vf consistent
> > with
> > > + adjut_max_vf above. */
> >
> > s/adjut/adjust/
> Fixed.
Hi Jakub,
ping: https://gcc.gnu.org/pipermail/gcc/2024-December/245238.html
Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > > + rhs = build_int_cst (type,
> > > + constant_lower_bound (omp_max_vf
> > > + (false)));
> > > break;
> > > case IFN_GOMP_SIMT_ORDERED_PRED:
> > > rhs = vf == 1 ? integer_zero_node : NULL_TREE;
> >
> >
> > Jakub