> -----Original Message-----
> From: Jakub Jelinek <ja...@redhat.com>
> Sent: Tuesday, July 30, 2024 3:16 PM
> To: Richard Biener <rguent...@suse.de>
> Cc: Richard Sandiford <richard.sandif...@arm.com>; Prathamesh Kulkarni
> <prathame...@nvidia.com>; gcc-patches@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
>
> External email: Use caution opening links or attachments
>
>
> On Tue, Jul 30, 2024 at 11:25:42AM +0200, Richard Biener wrote:
> > Only "relevant" stuff should be streamed - the offload code and all
> > trees refered to.
>
> Yeah.
>
> > > > I think all current issues are because of poly-* leaking in for
> > > > cases where a non-poly would have worked fine, but I have not
> had
> > > > a look myself.
> > >
> > > One of the cases that Prathamesh mentions is streaming the mode
> sizes.
> > > Are those modes "offload target modes" or "host modes"? It seems
> > > like it shouldn't be an error for the host to have VLA modes per
> se.
> > > It's just that those modes can't be used in the host/offload
> interface.
> >
> > There's a requirement that a mode mapping exists from the host to
> > target enum machine_mode. I don't remember exactly how we compute
> > that mapping and whether streaming of some data (and thus poly-int)
> > are part of this.
>
> During streaming out, the code records what machine modes are being
> streamed (in streamer_mode_table).
> For those modes (and their inner modes) then lto_write_mode_table
> should stream a table with mode details like class, bits, size, inner
> mode, nunits, real mode format if any, etc.
> That table is then streamed in in the offloading compiler and it
> attempts to find corresponding modes (and emits fatal_error if there
> is no such mode; consider say x86_64 long double with XFmode being
> used in offloading code which doesn't have XFmode support).
> Now, because Richard S. changed GET_MODE_SIZE etc. to give poly_int
> rather than int, this has been changed to use bp_pack_poly_value; but
> that relies on the same number of coefficients for poly_int, which is
> not the case when e.g. offloading aarch64 to gcn or nvptx.
Indeed, for the minimal test:
int main()
{
int x;
#pragma omp target map (to: x)
{
x = 0;
}
return x;
}
Streaming out mode_table from AArch64 shows:
mode = SI, mclass = 2, size = 4, prec = 32
mode = DI, mclass = 2, size = 8, prec = 64
While streaming-in for nvptx shows:
mclass = 2, size = 4, prec = 0
The discrepancy happens because of differing value of NUM_POLY_INT_COEFFS
between AArch64 and nvptx.
>From AArch64 it streams out size and prec as <4, 0> and <32, 0> respectively,
>where 0 comes from coeffs[1].
While streaming-in from nvptx, since NUM_POLY_INT_COEFFS is 1, it incorrectly
reads size as 4, and prec as 0.
>
> From what I can see, this mode table handling are the only uses of
> bp_pack_poly_value. So the options are either to stream at the start
> of the mode table the NUM_POLY_INT_COEFFS value and in
> bp_unpack_poly_value pass to it what we've read and fill in any
> remaining coeffs with zeros, or in each bp_pack_poly_value stream the
> number of coefficients and then stream that back in and fill in
> remaining ones (and diagnose if it would try to read non-zero
> coefficient which isn't stored).
This is the approach taken in proposed patch (stream-out degree of poly_int
followed by coeffs).
> I think streaming NUM_POLY_INT_COEFFS once would be more compact (at
> least for non-aarch64/riscv targets).
I will try implementing this, thanks.
Thanks,
Prathamesh
>
> Jakub