Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

Richard Sandiford Tue, 30 Jul 2024 02:01:20 -0700

Richard Biener <rguent...@suse.de> writes:
> On Tue, 30 Jul 2024, Prathamesh Kulkarni wrote:
>
>> 
>> 
>> > -----Original Message-----
>> > From: Richard Sandiford <richard.sandif...@arm.com>
>> > Sent: Monday, July 29, 2024 9:43 PM
>> > To: Richard Biener <rguent...@suse.de>
>> > Cc: Prathamesh Kulkarni <prathame...@nvidia.com>; gcc-
>> > patc...@gcc.gnu.org
>> > Subject: Re: Support streaming of poly_int for offloading when it's
>> > degree <= accel's NUM_POLY_INT_COEFFS
>> > 
>> > External email: Use caution opening links or attachments
>> > 
>> > 
>> > Richard Biener <rguent...@suse.de> writes:
>> > > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:
>> > >
>> > >> Hi Richard,
>> > >> Thanks for your suggestions on RFC email, the attached patch adds
>> > support for streaming of poly_int when it's degree <= accel's
>> > NUM_POLY_INT_COEFFS.
>> > >> The patch changes streaming of poly_int as follows:
>> > >>
>> > >> Streaming out poly_int:
>> > >>
>> > >> degree = poly_int.degree();
>> > >> stream out degree;
>> > >> for (i = 0; i < degree; i++)
>> > >>   stream out poly_int.coeffs[i];
>> > >>
>> > >> Streaming in poly_int:
>> > >>
>> > >> stream in degree;
>> > >> if (degree > NUM_POLY_INT_COEFFS)
>> > >>   fatal_error();
>> > >> stream in coeffs;
>> > >> // Set remaining coeffs to zero in case degree < accel's
>> > >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
>> > >>   poly_int.coeffs[i] = 0;
>> > >>
>> > >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-
>> > linux-gnu.
>> > >> LTO bootstrap+test on x86_64-linux-gnu in progress.
>> > >>
>> > >> I am not quite sure how to test it for offloading since currently
>> > it's (entirely) broken for aarch64->nvptx.
>> > >> I can give a try with x86_64->nvptx offloading if required (altho I
>> > >> guess LTO bootstrap should test streaming changes ?)
>> > >
>> > > +  unsigned degree
>> > > +    = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
>> > > HOST_WIDE_INT));
>> > >
>> > > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
>> > > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
>> > > I'd say we should constrain it to a reasonable upper bound, like 2?
>> > > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS
>> > in
>> > > poly-int.h and constrain NUM_POLY_INT_COEFFS.
>> > >
>> > > The patch looks reasonable over all, but Richard S. should have a
>> > say
>> > > about the abstraction you chose and the poly-int adjustment.
>> > 
>> > Sorry if this has been discussed already, but could we instead stream
>> > NUM_POLY_INT_COEFFS once per file, rather than once per poly_int?
>> > It's a target invariant, and poly_int has wormed its way into lots of
>> > things by now :)
>> Hi Richard,
>> The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of poly_int 
>> (and streams-out coeffs only up to degree, ignoring the higher zero coeffs).
>> During streaming-in, it reads back the degree (and streamed coeffs upto 
>> degree) and issues an error if degree > accel's NUM_POLY_INT_COEFFS, since 
>> we can't
>> (as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS < N. 
>> If degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are set to 0
>> (similar to zero-extension). I posted more details in RFC: 
>> https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html


It's not clear to me what the plan is for VLA host + VLS offloading.
Is the streamed data guaranteed to be "clean" of any host-only
VLA stuff?  E.g. if code does:

  #include <arm_sve.h>

  svint32_t *ptr:
  void foo(svint32_t);

  #pragma GCC target "+nosve"

  ...offloading...

is there a guarantee that the offload target won't see the definition
of ptr and foo?

>> 
>> The attached patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to 
>> represent number of bits needed for max value of NUM_POLY_INT_COEFFS defined 
>> by any target,
>> and uses that for packing/unpacking degree of poly_int to/from bitstream, 
>> which should make it independent of the type used for representing 
>> NUM_POLY_INT_COEFFS by
>> the target.
>
> Just as additional comment - maybe we can avoid the POLY_INT_CST tree
> side if we'd consistently "canonicalize" a POLY_INT_CST with zero
> second coeff as INTEGER_CST instead?  This of course doesn't
> generalize to NUM_POLY_INT_COEFFS == 3 vs NUM_POLY_INT_COEFFS == 2.

That should already happen, via:

tree
wide_int_to_tree (tree type, const poly_wide_int_ref &value)
{
  if (value.is_constant ())
    return wide_int_to_tree_1 (type, value.coeffs[0]);
  return build_poly_int_cst (type, value);
}

etc.  So if we see POLY_INT_CSTs that could be INTEGER_CSTs, I think
that'd be a bug.

Thanks,
Richard

Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

Reply via email to