Richard Biener <rguent...@suse.de> writes:
> On Tue, 30 Jul 2024, Richard Sandiford wrote:
>
>> Richard Biener <rguent...@suse.de> writes:
>> > On Tue, 30 Jul 2024, Prathamesh Kulkarni wrote:
>> >
>> >> 
>> >> 
>> >> > -----Original Message-----
>> >> > From: Richard Sandiford <richard.sandif...@arm.com>
>> >> > Sent: Monday, July 29, 2024 9:43 PM
>> >> > To: Richard Biener <rguent...@suse.de>
>> >> > Cc: Prathamesh Kulkarni <prathame...@nvidia.com>; gcc-
>> >> > patc...@gcc.gnu.org
>> >> > Subject: Re: Support streaming of poly_int for offloading when it's
>> >> > degree <= accel's NUM_POLY_INT_COEFFS
>> >> > 
>> >> > External email: Use caution opening links or attachments
>> >> > 
>> >> > 
>> >> > Richard Biener <rguent...@suse.de> writes:
>> >> > > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:
>> >> > >
>> >> > >> Hi Richard,
>> >> > >> Thanks for your suggestions on RFC email, the attached patch adds
>> >> > support for streaming of poly_int when it's degree <= accel's
>> >> > NUM_POLY_INT_COEFFS.
>> >> > >> The patch changes streaming of poly_int as follows:
>> >> > >>
>> >> > >> Streaming out poly_int:
>> >> > >>
>> >> > >> degree = poly_int.degree();
>> >> > >> stream out degree;
>> >> > >> for (i = 0; i < degree; i++)
>> >> > >>   stream out poly_int.coeffs[i];
>> >> > >>
>> >> > >> Streaming in poly_int:
>> >> > >>
>> >> > >> stream in degree;
>> >> > >> if (degree > NUM_POLY_INT_COEFFS)
>> >> > >>   fatal_error();
>> >> > >> stream in coeffs;
>> >> > >> // Set remaining coeffs to zero in case degree < accel's
>> >> > >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
>> >> > >>   poly_int.coeffs[i] = 0;
>> >> > >>
>> >> > >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-
>> >> > linux-gnu.
>> >> > >> LTO bootstrap+test on x86_64-linux-gnu in progress.
>> >> > >>
>> >> > >> I am not quite sure how to test it for offloading since currently
>> >> > it's (entirely) broken for aarch64->nvptx.
>> >> > >> I can give a try with x86_64->nvptx offloading if required (altho I
>> >> > >> guess LTO bootstrap should test streaming changes ?)
>> >> > >
>> >> > > +  unsigned degree
>> >> > > +    = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
>> >> > > HOST_WIDE_INT));
>> >> > >
>> >> > > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
>> >> > > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
>> >> > > I'd say we should constrain it to a reasonable upper bound, like 2?
>> >> > > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS
>> >> > in
>> >> > > poly-int.h and constrain NUM_POLY_INT_COEFFS.
>> >> > >
>> >> > > The patch looks reasonable over all, but Richard S. should have a
>> >> > say
>> >> > > about the abstraction you chose and the poly-int adjustment.
>> >> > 
>> >> > Sorry if this has been discussed already, but could we instead stream
>> >> > NUM_POLY_INT_COEFFS once per file, rather than once per poly_int?
>> >> > It's a target invariant, and poly_int has wormed its way into lots of
>> >> > things by now :)
>> >> Hi Richard,
>> >> The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of 
>> >> poly_int (and streams-out coeffs only up to degree, ignoring the higher 
>> >> zero coeffs).
>> >> During streaming-in, it reads back the degree (and streamed coeffs upto 
>> >> degree) and issues an error if degree > accel's NUM_POLY_INT_COEFFS, 
>> >> since we can't
>> >> (as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS < 
>> >> N. If degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are set 
>> >> to 0
>> >> (similar to zero-extension). I posted more details in RFC: 
>> >> https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html
>> 
>> It's not clear to me what the plan is for VLA host + VLS offloading.
>> Is the streamed data guaranteed to be "clean" of any host-only
>> VLA stuff?  E.g. if code does:
>> 
>>   #include <arm_sve.h>
>> 
>>   svint32_t *ptr:
>>   void foo(svint32_t);
>> 
>>   #pragma GCC target "+nosve"
>> 
>>   ...offloading...
>> 
>> is there a guarantee that the offload target won't see the definition
>> of ptr and foo?
>
> No.  If it sees any unsupported poly-* the offload compilation will fail.

Could it fail even if the offloading code doesn't refer to ptr and foo
directly?  Or is only "relevant" stuff streamed?

> I think all current issues are because of poly-* leaking in for cases
> where a non-poly would have worked fine, but I have not had a look
> myself.

One of the cases that Prathamesh mentions is streaming the mode sizes.
Are those modes "offload target modes" or "host modes"?  It seems like
it shouldn't be an error for the host to have VLA modes per se.  It's
just that those modes can't be used in the host/offload interface.

Thanks,
Richard

Reply via email to