On Tue, 30 Jul 2024, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > On Tue, 30 Jul 2024, Prathamesh Kulkarni wrote: > > > >> > >> > >> > -----Original Message----- > >> > From: Richard Sandiford <richard.sandif...@arm.com> > >> > Sent: Monday, July 29, 2024 9:43 PM > >> > To: Richard Biener <rguent...@suse.de> > >> > Cc: Prathamesh Kulkarni <prathame...@nvidia.com>; gcc- > >> > patc...@gcc.gnu.org > >> > Subject: Re: Support streaming of poly_int for offloading when it's > >> > degree <= accel's NUM_POLY_INT_COEFFS > >> > > >> > External email: Use caution opening links or attachments > >> > > >> > > >> > Richard Biener <rguent...@suse.de> writes: > >> > > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote: > >> > > > >> > >> Hi Richard, > >> > >> Thanks for your suggestions on RFC email, the attached patch adds > >> > support for streaming of poly_int when it's degree <= accel's > >> > NUM_POLY_INT_COEFFS. > >> > >> The patch changes streaming of poly_int as follows: > >> > >> > >> > >> Streaming out poly_int: > >> > >> > >> > >> degree = poly_int.degree(); > >> > >> stream out degree; > >> > >> for (i = 0; i < degree; i++) > >> > >> stream out poly_int.coeffs[i]; > >> > >> > >> > >> Streaming in poly_int: > >> > >> > >> > >> stream in degree; > >> > >> if (degree > NUM_POLY_INT_COEFFS) > >> > >> fatal_error(); > >> > >> stream in coeffs; > >> > >> // Set remaining coeffs to zero in case degree < accel's > >> > >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++) > >> > >> poly_int.coeffs[i] = 0; > >> > >> > >> > >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64- > >> > linux-gnu. > >> > >> LTO bootstrap+test on x86_64-linux-gnu in progress. > >> > >> > >> > >> I am not quite sure how to test it for offloading since currently > >> > it's (entirely) broken for aarch64->nvptx. > >> > >> I can give a try with x86_64->nvptx offloading if required (altho I > >> > >> guess LTO bootstrap should test streaming changes ?) > >> > > > >> > > + unsigned degree > >> > > + = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned > >> > > HOST_WIDE_INT)); > >> > > > >> > > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained > >> > > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary. > >> > > I'd say we should constrain it to a reasonable upper bound, like 2? > >> > > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS > >> > in > >> > > poly-int.h and constrain NUM_POLY_INT_COEFFS. > >> > > > >> > > The patch looks reasonable over all, but Richard S. should have a > >> > say > >> > > about the abstraction you chose and the poly-int adjustment. > >> > > >> > Sorry if this has been discussed already, but could we instead stream > >> > NUM_POLY_INT_COEFFS once per file, rather than once per poly_int? > >> > It's a target invariant, and poly_int has wormed its way into lots of > >> > things by now :) > >> Hi Richard, > >> The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of > >> poly_int (and streams-out coeffs only up to degree, ignoring the higher > >> zero coeffs). > >> During streaming-in, it reads back the degree (and streamed coeffs upto > >> degree) and issues an error if degree > accel's NUM_POLY_INT_COEFFS, since > >> we can't > >> (as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS < > >> N. If degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are set > >> to 0 > >> (similar to zero-extension). I posted more details in RFC: > >> https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html > > It's not clear to me what the plan is for VLA host + VLS offloading. > Is the streamed data guaranteed to be "clean" of any host-only > VLA stuff? E.g. if code does: > > #include <arm_sve.h> > > svint32_t *ptr: > void foo(svint32_t); > > #pragma GCC target "+nosve" > > ...offloading... > > is there a guarantee that the offload target won't see the definition > of ptr and foo?
No. If it sees any unsupported poly-* the offload compilation will fail. I think all current issues are because of poly-* leaking in for cases where a non-poly would have worked fine, but I have not had a look myself. > >> > >> The attached patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to > >> represent number of bits needed for max value of NUM_POLY_INT_COEFFS > >> defined by any target, > >> and uses that for packing/unpacking degree of poly_int to/from bitstream, > >> which should make it independent of the type used for representing > >> NUM_POLY_INT_COEFFS by > >> the target. > > > > Just as additional comment - maybe we can avoid the POLY_INT_CST tree > > side if we'd consistently "canonicalize" a POLY_INT_CST with zero > > second coeff as INTEGER_CST instead? This of course doesn't > > generalize to NUM_POLY_INT_COEFFS == 3 vs NUM_POLY_INT_COEFFS == 2. > > That should already happen, via: > > tree > wide_int_to_tree (tree type, const poly_wide_int_ref &value) > { > if (value.is_constant ()) > return wide_int_to_tree_1 (type, value.coeffs[0]); > return build_poly_int_cst (type, value); > } > > etc. So if we see POLY_INT_CSTs that could be INTEGER_CSTs, I think > that'd be a bug. I see. So we should be able to get rid of the POLY_INT_CST changes in the patch (and track down failure to canonicalize to INTEGER_CSTs). For streaming of data structures with poly_int<> we still need to do something and IMO the approach proposed is fine? Richard.