> -----Original Message----- > From: Jakub Jelinek <ja...@redhat.com> > Sent: Wednesday, July 31, 2024 8:46 PM > To: Prathamesh Kulkarni <prathame...@nvidia.com> > Cc: Richard Biener <rguent...@suse.de>; Richard Sandiford > <richard.sandif...@arm.com>; gcc-patches@gcc.gnu.org > Subject: Re: Support streaming of poly_int for offloading when it's > degree <= accel's NUM_POLY_INT_COEFFS > > External email: Use caution opening links or attachments > > > On Wed, Jul 31, 2024 at 02:58:34PM +0000, Prathamesh Kulkarni wrote: > > There are a couple of issues in the patch: > > (1) The patch streams out NUM_POLY_INT_COEFFS at beginning of > > mode_table, which should work for bp_unpack_poly_value, (since AFAIK, > > it's only called by lto_input_mode_table). However, I am not sure if > we will always call lto_input_mode_table before streaming in poly_int64 > / poly_uint64 ? Or should we stream out host NUM_POLY_INT_COEFFS at a > different place in LTO bytecode ? > > The poly_ints unpacked in lto_input_mode_table obviously are done after > that. > If you use it for streaming in from other sections, you need to check if > they can't be read before the mode table. > And, you don't really need to stream out/in the number for non- > offloading LTO, that should use just NUM_POLY_INT_COEFFS. > > > --- a/gcc/data-streamer-in.cc > > +++ b/gcc/data-streamer-in.cc > > @@ -183,9 +183,7 @@ poly_uint64 > > streamer_read_poly_uint64 (class lto_input_block *ib) { > > poly_uint64 res; > > - for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i) > > - res.coeffs[i] = streamer_read_uhwi (ib); > > - return res; > > + POLY_INT_READ_COMMON(res, streamer_read_uhwi (ib)) > > Why is this macro and not an inline function or inline function template > oor inline function calling a lambda? > Even if it has to be a macro (I don't see why), it should be defined > such that you need to add ; at the end, ideally not include the return > res; in there because it is just too weird if used like that (or make it > return what will be returned and use return POLY_INT_READ_COMMON...) and > there needs to be a space in between COMMON and (. > > > @@ -194,9 +192,7 @@ poly_int64 > > streamer_read_poly_int64 (class lto_input_block *ib) { > > poly_int64 res; > > - for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i) > > - res.coeffs[i] = streamer_read_hwi (ib); > > - return res; > > + POLY_INT_READ_COMMON(res, streamer_read_hwi (ib)) > > } > > Ditto. > > + __typeof(x.coeffs[0]) val = streamer_read_coeff; > \ > > You certainly can't use a GCC extension like __typeof here. > Plus missing space. > > > + if (val != 0) > \ > > + fatal_error (input_location, > \ > > + "Degree of %<poly_int%> exceeds " > \ > > Diagnostics shouldn't start with uppercase letter. > > > + "%<NUM_POLY_INT_COEFFS%> (%d)", > \ > > + NUM_POLY_INT_COEFFS); > \ > > + } > \ > > + } > \ > > + > \ > > + return x; > \ > > +} > > + > > --- a/gcc/poly-int.h > > +++ b/gcc/poly-int.h > > @@ -354,6 +354,10 @@ struct poly_result<T1, T2, 2> > > ? (void) ((RES).coeffs[I] = VALUE) \ > > : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C > > (VALUE))) > > > > +/* Number of bits needed to represent maximum value of > > + NUM_POLY_INT_COEFFS defined by any target. */ #define > > +MAX_NUM_POLY_INT_COEFFS_BITS (2) > > Why (2) and not just 2? > There should be some static_assert to make sure it is a maximum for any > target. > > > + if (!integer_zerop (val)) > > + fatal_error (input_location, > > + "Degree of %<poly_int%> exceeds " > > Again. Hi Jakub, Thanks for the suggestions. The attached patch streams out NUM_POLY_INT_COEFFS only if offloading is enabled, defines poly_int_read_common to be variadic template, and asserts that host_num_poly_int_coeffs is streamed-in before reading poly_int (AFAIU, the only user of streamer_read_poly_int64 currently is ipa-modref pass).
Patch passes LTO bootstrap+test on aarch64-linux-gnu and in-progress on x86_64-linux-gnu. To test this patch with offloading enabled, I have so far just been testing libgomp (make check-target-libgomp). Should I be testing any additional parts of the testsuite for offloading ? Does the attached patch look in the right direction ? Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com> Thanks, Prathamesh > > + "%<NUM_POLY_INT_COEFFS%>"); > > + } > > + } > > } > > Jakub
Partially support streaming of poly_int for offloading. The patch streams out host NUM_POLY_INT_COEFFS, and changes streaming in as follows: if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS) { for (i = 0; i < host_num_poly_int_coeffs; i++) poly_int.coeffs[i] = stream_in coeff; for (; i < NUM_POLY_INT_COEFFS; i++) poly_int.coeffs[i] = 0; } else { for (i = 0; i < NUM_POLY_INT_COEFFS; i++) poly_int.coeffs[i] = stream_in coeff; /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS. */ for (; i < host_num_poly_int_coeffs; i++) { val = stream_in coeff; if (val != 0) error (); } } gcc/ChangeLog: PR ipa/96265 PR ipa/111937 * data-streamer-in.cc (streamer_read_poly_uint64): Remove code for streaming, and call poly_int_read_common instead. (streamer_read_poly_int64): Likewise. * data-streamer.cc (host_num_poly_int_coeffs): New variable. * data-streamer.h (host_num_poly_int_coeffs): Declare. (poly_int_read_common): New function template. (bp_unpack_poly_value): Remove code for streaming and call poly_int_read_common instead. * lto-streamer-in.cc (lto_input_mode_table): Stream-in host NUM_POLY_INT_COEFFS into host_num_poly_int_coeffs. * lto-streamer-out.cc (lto_write_mode_table): Stream out NUM_POLY_INT_COEFFS if offloading is enabled. * poly-int.h (MAX_NUM_POLY_INT_COEFFS_BITS): New macro. * tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Adjust streaming-in of poly_int. Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com> diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc index 7dce2928ef0..7b9d8cc0129 100644 --- a/gcc/data-streamer-in.cc +++ b/gcc/data-streamer-in.cc @@ -182,10 +182,8 @@ streamer_read_hwi (class lto_input_block *ib) poly_uint64 streamer_read_poly_uint64 (class lto_input_block *ib) { - poly_uint64 res; - for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i) - res.coeffs[i] = streamer_read_uhwi (ib); - return res; + return poly_int_read_common<poly_int_traits<poly_uint64>::coeff_type> + (streamer_read_uhwi, ib); } /* Read a poly_int64 from IB. */ @@ -193,10 +191,8 @@ streamer_read_poly_uint64 (class lto_input_block *ib) poly_int64 streamer_read_poly_int64 (class lto_input_block *ib) { - poly_int64 res; - for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i) - res.coeffs[i] = streamer_read_hwi (ib); - return res; + return poly_int_read_common<poly_int_traits<poly_int64>::coeff_type> + (streamer_read_hwi, ib); } /* Read gcov_type value from IB. */ diff --git a/gcc/data-streamer.cc b/gcc/data-streamer.cc index 346b294c72a..5bf2f086d2a 100644 --- a/gcc/data-streamer.cc +++ b/gcc/data-streamer.cc @@ -28,6 +28,12 @@ along with GCC; see the file COPYING3. If not see #include "cgraph.h" #include "data-streamer.h" +/* For offloading -- While streaming-out, host NUM_POLY_INT_COEFFS is + stored at beginning of mode_table. While streaming-in, the value is read in + host_num_poly_int_coeffs. */ + +unsigned host_num_poly_int_coeffs = 0; + /* Pack WORK into BP in a variant of uleb format. */ void diff --git a/gcc/data-streamer.h b/gcc/data-streamer.h index 6a2596134ce..c77eacb74ca 100644 --- a/gcc/data-streamer.h +++ b/gcc/data-streamer.h @@ -50,6 +50,7 @@ void bp_pack_real_value (struct bitpack_d *, const REAL_VALUE_TYPE *); void bp_unpack_real_value (struct bitpack_d *, REAL_VALUE_TYPE *); unsigned HOST_WIDE_INT bp_unpack_var_len_unsigned (struct bitpack_d *); HOST_WIDE_INT bp_unpack_var_len_int (struct bitpack_d *); +extern unsigned host_num_poly_int_coeffs; /* In data-streamer-out.cc */ void streamer_write_zero (struct output_block *); @@ -194,15 +195,53 @@ bp_unpack_value (struct bitpack_d *bp, unsigned nbits) return val & mask; } +/* Common code for reading poly_int. */ + +template<typename C, typename F, typename ...Args> +poly_int<NUM_POLY_INT_COEFFS, C> +poly_int_read_common (F read_coeff, Args ...args) +{ + poly_int<NUM_POLY_INT_COEFFS, C> x; + unsigned i; + +#ifndef ACCEL_COMPILER + host_num_poly_int_coeffs = NUM_POLY_INT_COEFFS; +#endif + + /* Ensure that we have streamed-in host_num_poly_int_coeffs. */ + gcc_assert (host_num_poly_int_coeffs > 0); + if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS) + { + for (i = 0; i < host_num_poly_int_coeffs; i++) + x.coeffs[i] = read_coeff (args...); + for (; i < NUM_POLY_INT_COEFFS; i++) + x.coeffs[i] = 0; + } + else + { + for (i = 0; i < NUM_POLY_INT_COEFFS; i++) + x.coeffs[i] = read_coeff (args...); + + /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS. */ + for (; i < host_num_poly_int_coeffs; i++) + { + C val = read_coeff (args...); + if (val != 0) + fatal_error (input_location, + "degree of %<poly_int%> exceeds " + "%<NUM_POLY_INT_COEFFS%> (%d)", + NUM_POLY_INT_COEFFS); + } + } + return x; +} + /* Unpacks a polynomial value from the bit-packing context BP in which each coefficient has NBITS bits. */ inline poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> bp_unpack_poly_value (struct bitpack_d *bp, unsigned nbits) { - poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> x; - for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i) - x.coeffs[i] = bp_unpack_value (bp, nbits); - return x; + return poly_int_read_common<bitpack_word_t> (bp_unpack_value, bp, nbits); } diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc index 2e592be8082..3e2c786fc36 100644 --- a/gcc/lto-streamer-in.cc +++ b/gcc/lto-streamer-in.cc @@ -2013,6 +2013,9 @@ lto_input_mode_table (struct lto_file_decl_data *file_data) header->string_size, vNULL); bitpack_d bp = streamer_read_bitpack (&ib); + host_num_poly_int_coeffs + = bp_unpack_value (&bp, MAX_NUM_POLY_INT_COEFFS_BITS); + unsigned mode_bits = bp_unpack_value (&bp, 5); unsigned char *table = ggc_cleared_vec_alloc<unsigned char> (1 << mode_bits); diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc index c329ac8af95..523d6dad221 100644 --- a/gcc/lto-streamer-out.cc +++ b/gcc/lto-streamer-out.cc @@ -3192,6 +3192,9 @@ lto_write_mode_table (void) ob = create_output_block (LTO_section_mode_table); bitpack_d bp = bitpack_create (ob->main_stream); + if (lto_stream_offload_p) + bp_pack_value (&bp, NUM_POLY_INT_COEFFS, MAX_NUM_POLY_INT_COEFFS_BITS); + /* Ensure that for GET_MODE_INNER (m) != m we have also the inner mode marked. */ for (int i = 0; i < (int) MAX_MACHINE_MODE; i++) diff --git a/gcc/poly-int.h b/gcc/poly-int.h index e3f8d4df716..94708165961 100644 --- a/gcc/poly-int.h +++ b/gcc/poly-int.h @@ -354,6 +354,10 @@ struct poly_result<T1, T2, 2> ? (void) ((RES).coeffs[I] = VALUE) \ : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE))) +/* Number of bits needed to represent maximum value of + NUM_POLY_INT_COEFFS defined by any target. */ +#define MAX_NUM_POLY_INT_COEFFS_BITS 2 + /* poly_int_full and poly_int_hungry are used internally within poly_int for delegated initializers. poly_int_full indicates that a parameter pack has enough elements to initialize every coefficient. poly_int_hungry diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc index c248a74f7a1..62200ef2d9d 100644 --- a/gcc/tree-streamer-in.cc +++ b/gcc/tree-streamer-in.cc @@ -671,8 +671,35 @@ static void lto_input_ts_poly_tree_pointers (class lto_input_block *ib, class data_in *data_in, tree expr) { - for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i) - POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in); +#ifndef ACCEL_COMPILER + host_num_poly_int_coeffs = NUM_POLY_INT_COEFFS; +#endif + + /* Ensure that we have streamed-in host_num_poly_int_coeffs. */ + gcc_assert (host_num_poly_int_coeffs > 0); + unsigned i; + if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS) + { + for (i = 0; i < host_num_poly_int_coeffs; i++) + POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in); + + tree coeff_type = TREE_TYPE (POLY_INT_CST_COEFF (expr, 0)); + for (; i < NUM_POLY_INT_COEFFS; i++) + POLY_INT_CST_COEFF (expr, i) = build_zero_cst (coeff_type); + } + else + { + for (i = 0; i < NUM_POLY_INT_COEFFS; i++) + POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in); + for (; i < host_num_poly_int_coeffs; i++) + { + tree val = stream_read_tree_ref (ib, data_in); + if (!integer_zerop (val)) + fatal_error (input_location, + "degree of %<poly_int%> exceeds " + "%<NUM_POLY_INT_COEFFS%>"); + } + } }