> -----Original Message-----
> From: Andrew Pinski <pins...@gmail.com>
> Sent: Monday, August 12, 2024 12:28 PM
> To: Prathamesh Kulkarni <prathame...@nvidia.com>
> Cc: Richard Biener <richard.guent...@gmail.com>; gcc@gcc.gnu.org;
> tschwi...@baylibre.com
> Subject: Re: [RFC] Summary of libgomp failures for offloading to nvptx
> from AArch64
> 
> External email: Use caution opening links or attachments
> 
> 
> On Sun, Aug 11, 2024 at 11:36 PM Prathamesh Kulkarni via Gcc
> <gcc@gcc.gnu.org> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Richard Biener <richard.guent...@gmail.com>
> > > Sent: Monday, July 29, 2024 7:18 PM
> > > To: Prathamesh Kulkarni <prathame...@nvidia.com>
> > > Cc: gcc@gcc.gnu.org
> > > Subject: Re: [RFC] Summary of libgomp failures for offloading to
> > > nvptx from AArch64
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Mon, Jul 29, 2024 at 1:35 PM Prathamesh Kulkarni
> > > <prathame...@nvidia.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <richard.guent...@gmail.com>
> > > > > Sent: Friday, July 26, 2024 6:51 PM
> > > > > To: Prathamesh Kulkarni <prathame...@nvidia.com>
> > > > > Cc: gcc@gcc.gnu.org
> > > > > Subject: Re: [RFC] Summary of libgomp failures for offloading
> to
> > > > > nvptx from AArch64
> > > > >
> > > > > External email: Use caution opening links or attachments
> > > > >
> > > > >
> > > > > On Thu, Jul 25, 2024 at 3:36 PM Prathamesh Kulkarni via Gcc
> > > > > <gcc@gcc.gnu.org> wrote:
> > > > > >
> > > > > > Hi,
> > > > > > I am working on enabling offloading to nvptx from AAarch64
> host.
> > > > > > As mentioned on wiki
> > > > > >
> (https://gcc.gnu.org/wiki/Offloading#Running_.27make_check.27)
> > > > > > , I ran make check-target-libgomp on AAarch64 host (and no
> > > > > > GPU)
> > > with
> > > > > following results:
> > > > > >
> > > > > >                 === libgomp Summary ===
> > > > > >
> > > > > > # of expected passes            14568
> > > > > > # of unexpected failures        1023
> > > > > > # of expected failures          309
> > > > > > # of untested testcases         54
> > > > > > # of unresolved testcases       992
> > > > > > # of unsupported tests          644
> > > > > >
> > > > > > It seems majority of the tests fail due to the following 4
> > > issues:
> > > > > >
> > > > > > * Compiling a minimal test-case:
> > > > > >
> > > > > > int main()
> > > > > > {
> > > > > >   int x;
> > > > > >   #pragma omp target map (to: x)
> > > > > >   {
> > > > > >     x = 0;
> > > > > >   }
> > > > > >   return x;
> > > > > > }
> > > > > >
> > > > > > Compiling with -fopenmp -foffload=nvptx-none results in
> > > following
> > > > > issues:
> > > > > >
> > > > > > (1) Differing values of NUM_POLY_INT_COEFFS between host and
> > > > > accelerator, which results in following ICE:
> > > > > >
> > > > > > 0x1a6e0a7 pp_quoted_string
> > > > > >         ../../gcc/gcc/pretty-print.cc:2277
> > > > > >  0x1a6ffb3 pp_format(pretty_printer*, text_info*, urlifier
> > > const*)
> > > > > >         ../../gcc/gcc/pretty-print.cc:1634
> > > > > >  0x1a4a3f3
> > > diagnostic_context::report_diagnostic(diagnostic_info*)
> > > > > >         ../../gcc/gcc/diagnostic.cc:1612
> > > > > >  0x1a4a727 diagnostic_impl
> > > > > >         ../../gcc/gcc/diagnostic.cc:1775  0x1a4e20b
> > > > > > fatal_error(unsigned int, char const*, ...)
> > > > > >         ../../gcc/gcc/diagnostic.cc:2218  0xb3088f
> > > > > > lto_input_mode_table(lto_file_decl_data*)
> > > > > >          ../../gcc/gcc/lto-streamer-in.cc:2121
> > > > > >  0x6f5cdf lto_file_finalize
> > > > > >         ../../gcc/gcc/lto/lto-common.cc:2285
> > > > > >  0x6f5cdf lto_create_files_from_ids
> > > > > >         ../../gcc/gcc/lto/lto-common.cc:2309
> > > > > >  0x6f5cdf lto_file_read
> > > > > >         ../../gcc/gcc/lto/lto-common.cc:2364
> > > > > >  0x6f5cdf read_cgraph_and_symbols(unsigned int, char
> const**)
> > > > > >         ../../gcc/gcc/lto/lto-common.cc:2812
> > > > > >  0x6cfb93 lto_main()
> > > > > >         ../../gcc/gcc/lto/lto.cc:658
> > > > > >
> > > > > > This is already tracked in https://gcc.gnu.org/PR96265 (and
> > > > > > related
> > > > > > PR's)
> > > > > >
> > > > > > Streaming out mode_table:
> > > > > > mode = SI, mclass = 2, size = 4, prec = 32 mode = DI, mclass
> =
> > > 2,
> > > > > size
> > > > > > = 8, prec = 64
> > > > > >
> > > > > > Streaming in mode_table (in lto_input_mode_table):
> > > > > > mclass = 2, size = 4, prec = 0 (and then calculates the
> > > > > > correct mode value by iterating over
> > > all
> > > > > > modes of mclass starting from narrowest mode)
> > > > > >
> > > > > > The issue is that the value for prec is not getting
> > > > > > streamed-in correctly for SImode as seen above. While
> > > > > > streaming out from
> > > > > > AArch64
> > > > > host, it is 32, but while streaming in for nvptx, it is 0.
> This
> > > > > happens because of differing values of NUM_POLY_INT_COEFFS
> > > > > between
> > > > > AArch64 and nvptx backend.
> > > > > >
> > > > > > Since NUM_POLY_INT_COEFFS is 2 for aarch64, the streamed-out
> > > > > > values for mode, precision would be <4, 0> and <32, 0>
> > > > > > respectively (streamed-out in bp_pack_poly_value). Both
> zeros
> > > come
> > > > > > from coeffs[1] of size and prec. While streaming in however,
> > > > > > NUM_POLY_INT_COEFFS is
> > > > > 1 for nvptx, and thus it incorrectly treats <4, 0> as size and
> > > > > precision respectively, which is why precision gets streamed
> in
> > > > > as 0, and thus it encounters the above ICE.
> > > > > >
> > > > > > Supporting non VLA code with offloading:
> > > > > >
> > > > > > In the general case, it's hard to support offloading for
> > > arbitrary
> > > > > poly_ints when NUM_POLY_INT_COEFFS differs for host and
> > > accelerator.
> > > > > > For example, it's not possible to represent a degree-2
> > > > > > poly_int like
> > > > > 4 + 4x (as-is) on an accelerator with NUM_POLY_INT_COEFFS ==
> 1.
> > > > > >
> > > > > > However, IIUC, we can support offloading for restricted set
> of
> > > > > > poly_ints whose degree <= accel's NUM_POLY_INT_COEFFS, since
> > > they
> > > > > can
> > > > > > be represented on accelerator ? For a hypothetical example,
> if
> > > > > > host
> > > > > NUM_POLY_INT_COEFFS == 3 and accel NUM_POLY_INT_COEFFS == 2,
> > > > > then
> > > I
> > > > > suppose we could represent a degree 2 poly_int on accelerator,
> > > > > but not a degree 3 poly_int like 3+4x+5x^2 ?
> > > > > >
> > > > > > Based on that, I have come up with following approach in
> > > attached
> > > > > "quick-and-dirty" patch (p-163-2.diff):
> > > > > > Stream-out host NUM_POLY_INT_COEFFS, and while streaming-in
> > > during
> > > > > lto1, compare it with accelerator's NUM_POLY_INT_COEFFS as
> > > follows:
> > > > > >
> > > > > > Stream in host_num_poly_int_coeffs; if
> > > > > > (host_num_poly_int_coeffs == NUM_POLY_INT_COEFFS) //
> > > > > NUM_POLY_INT_COEFFS represents accelerator's value here.
> > > > > > {
> > > > > >     /* Both are equal, proceed to unpacking
> > > > > > NUM_POLY_INT_COEFFS
> > > > > words
> > > > > > from bitstream.  */ } else if (host_num_poly_int_coeffs <
> > > > > > NUM_POLY_INT_COEFFS) {
> > > > > >     /* Unpack host_num_poly_int_coeffs words and zero out
> > > > > > remaining higher coeffs (similar to zero-extension).  */ }
> > > > > > else
> > > {
> > > > > >     /* Unpack host_num_poly_int_coeffs words and ensure that
> > > > > > degree
> > > > > of
> > > > > > streamed-out poly_int <= NUM_POLY_INT_COEFFS.  */ }
> > > > > >
> > > > > > For example, with host NUM_POLY_INT_COEFFS == 2 and accel
> > > > > > NUM_POLY_INT_COEFFS == 1, this will allow streaming of
> > > > > > "degree-
> > > 1"
> > > > > poly_ints like 4+0x (which will degenerate to constant 4), but
> > > give
> > > > > an error for streaming degree-2 poly_int like 4+4x.
> > > > > >
> > > > > > Following this approach, I am assuming we can support
> > > > > > AArch64/nvptx offloading for non VLA code, since poly_ints
> > > > > > used for representing various artefacts like mode_size,
> > > mode_precision, vector length etc.
> > > > > will be degree-1 poly_int for scalar variables and fixed-
> length
> > > > > vectors (and thus degenerate to constants). With the patch
> > > applied,
> > > > > it proceeds forward from this point, but fails at a later
> stage
> > > (see
> > > > > below).
> > > > > >
> > > > > > Does this approach look reasonable for supporting offloading
> > > > > > for non
> > > > > VLA code ?
> > > > > > Are there any cases I may have overlooked, where offloading
> > > > > > will
> > > > > still fail for non-VLA code due to differing
> NUM_POLY_INT_COEFFS
> > > > > issue ?
> > > > >
> > > > > I think I'd change how we stream POLY_INTs and make that
> > > "independent"
> > > > > of NUM_POLY_INT_COEFFS
> > > > > in that I'd stream the effective number of coeffs required -
> as
> > > you
> > > > > suggest "trailing" zero coeffs do not need to be represented.
> > > > > We always stream coeff zero.
> > > > >
> > > > > On read in when that N is bigger than NUM_POLY_INT_COEFFS we
> > > > > have
> > > to
> > > > > error and we zero-fill extra elements (using the type of coeff
> > > zero).
> > > > >
> > > > > > (2) nvptx mkoffload.cc passes -m64/-m32 to host compiler if
> > > > > > -foffload-abi=lp64/ilp32 After applying workaround for the
> > > > > > above
> > > > > assertion failure, it hits the following error:
> > > > > > gcc: error: unrecognized command-line option '-m64'
> > > > > > nvptx mkoffload: fatal error: ../install/bin/gcc returned 1
> > > > > > exit status compilation terminated.
> > > > > >
> > > > > > This happens because nvptx/mkoffload.cc:compile_native
> passes
> > > > > > -m64/-m32 to host compiler depending on whether offload_abi
> is
> > > > > OFFLOAD_ABI_LP64 or OFFLOAD_ABI_ILP32, and aarch64 backend
> > > > > doesn't recognize these options.
> > > > > >
> > > > > > I suppose a simple solution to check if host_compiler
> supports
> > > > > > a particular command-line option, would be to create a dummy
> C
> > > file
> > > > > and
> > > > > > check if the command "host_compiler <opt> dummy_file.c"
> > > > > > returns zero
> > > > > exit status. Alternative could be to check exit status for
> > > > > "host_compiler <opt> --version", once
> > > > > http://gcc.gnu.org/PR116050
> > > is
> > > > > fixed, but I am not sure if either is an ideal solution.
> > > > >
> > > > > I think an explicit configure when building the offload
> compiler
> > > > > would be cleaner, like simply adding an additional
> > > > > AC_SUBST(host_no_multilibs) var computed from the --enable-as-
> > > > > accelerator-for target triplet?
> > > > >
> > > > > > With workarounds for these 2 issues, the minimal test builds
> > > > > > and
> > > > > runs successfully.
> > > > > >
> > > > > > (3) Assertion error in lto_read_decls during lto1:
> > > > > > There are several failures (~350+) in the testsuite caused
> due
> > > to
> > > > > the
> > > > > > following assert in lto_read_decls:
> > > > > > gcc_assert (data_in->reader_cache->nodes.length () == from +
> > > > > > 1);
> > > > > >
> > > > > > AFAIU, this seems to happen because of presence of LTO_null
> tag.
> > > > > > The following workaround avoids hitting the assert, but am
> not
> > > > > > sure if
> > > > > it's the right fix:
> > > > > >
> > > > > >         t = lto_input_tree_1 (&ib_main, data_in, tag, 0);
> > > > > > +       if (t == NULL_TREE)
> > > > > > +         continue;
> > > > > >         gcc_assert (data_in->reader_cache->nodes.length ()
> ==
> > > from
> > > > > > + 1);
> > > > > >
> > > > > > (FWIW, this was reproducible with the above minimal test,
> but
> > > has
> > > > > > seemingly gone away for it after updating the sources
> > > > > > recently, but still reproduces with libgomp tests like
> > > > > > for-9.c, baseptrs-
> > > 4.C
> > > > > > etc.)
> > > > >
> > > > > It looks like a discrepancy between host and target that
> wasn't
> > > > > properly analyzed to me.  The assert makes sure we read the
> > > expected
> > > > > number of tree nodes - what's the tag when you hit this?
> > > > > Please open a bug.
> > > > Sure, will do shortly.
> > > >
> > > > I checked with for-9.c, and it shows the following state on
> > > > hitting
> > > the assert:
> > > > reader_cache->nodes.length () = 2384, from+1 = 2385, tag:
> LTO_null
> > > >
> > > > From lto_read_decls:
> > > >       unsigned from = data_in->reader_cache->nodes.length ();
> > > >       /* Read and uniquify SCCs as in the input stream.  */
> > > >       enum LTO_tags tag = streamer_read_record_start (&ib_main);
> > > >       if (tag == LTO_tree_scc || tag == LTO_trees)
> > > >
> > > > In this case, tag is LTO_null, and it skips to the else part:
> > > >       t = lto_input_tree_1 (&ib_main, data_in, tag, 0);
> > > >       gcc_assert (data_in->reader_cache->nodes.length () == from
> +
> > > 1);
> > > >
> > > > and lto_input_tree_1 simply returns NULL_TREE if tag is LTO_null
> > > without any additional streaming-in.
> > > > So IIUC, data_in->reader_cache->nodes.length () won't change
> since
> > > > there is no tree node streamed in lto_input_tree_1, and thus
> > > reader_cache->length() remains equal to from (rather than from +
> 1),
> > > which triggers the assert ?
> > >
> > > Interesting.  The thing is that we don't expect LTO_null when
> > > streaming in global decls and types - we wouldn't have streamed a
> > > NULL decl or type.  So can you debug this on the writer side and
> see
> > > where we stream such?
> > Hi Richard,
> > Sorry for late response.
> >
> > The ICE can also be reproduced with an even simpler test-case:
> >
> > int main()
> > {
> >   int x;
> >   #pragma omp target map(x)
> >     x;
> > }
> >
> > Compiling with -O3 -fopenmp -foffload=nvptx-none hits the same
> assert in lto_read_decls.
> >
> > I think the issue here possibly is corrupted streaming of
> > optimization_node, due to presence of AArch64 specific optimization
> options in LTO bytecode.
> >
> > From AArch64 cl_optimization_stream_out:
> >
> >   bp_pack_var_len_int (bp, ptr->x_flag_wrapv_pointer);
> >   bp_pack_var_len_int (bp, ptr->x_debug_nonbind_markers_p);
> >
> >   bp_pack_var_len_int (bp, ptr->x_flag_aarch64_early_ldp_fusion);
> >   bp_pack_var_len_int (bp, ptr->x_aarch64_early_ra);
> >   bp_pack_var_len_int (bp, ptr->x_flag_aarch64_late_ldp_fusion);
> >   bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_div);
> >   bp_pack_var_len_int (bp, ptr->x_flag_mrecip_low_precision_sqrt);
> >   bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_sqrt);
> >
> >   for (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)
> >     bp_pack_value (bp, ptr->explicit_mask[i], 64);
> >
> > And nvptx cl_optimization_stream_in:
> >
> >   ptr->x_flag_wrapv_pointer = (signed char ) bp_unpack_var_len_int
> (bp);
> >   ptr->x_debug_nonbind_markers_p = (signed char )
> bp_unpack_var_len_int (bp);
> >   for (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)
> >     ptr->explicit_mask[i] = bp_unpack_value (bp, 64);
> >
> > While AArch64 host streams out target-specific opts like
> > flag_aarch64_early_ldp_fusion, aarch64_early_ra etc., there's no
> corresponding stream-in for these options for nvptx. And will thus
> result in invalid streaming-in for ptr->explicit_mask (and subsequent
> data structures).
> >
> > To verify if this was indeed the cause of failure, I (temporarily)
> > changed options in aarch64.opt marked with Optimization to Save, and
> > verified that AArch64 cl_optimization_stream_out matches with nvptx
> cl_optimization_stream_in, which prevents the above ICE.
> 
> Just a quick note, changing it to Save from Optimization will also
> break gcc.target/aarch64/sve/target_optimization-1.c (which was added
> with r15-2344-g8a5f528fba788f) .
> 
> >
> > FWIW, x86_64 cl_optimization_stream_out also similarly streams out
> ix86_unroll_only_small_loops:
> 
> Note that is also a recent change (little over 3 weeks ago,
> r15-2430-ga59c4e496fa916) and I suspect nobody tested offloading after
> that change; it was not backported to the branches either because it
> was known to need to bump the LTO minor version there.
Hi Andrew,
Thanks for the pointers!
I posted a patch to fix this issue: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660222.html

Thanks,
Prathamesh
> 
> Thanks,
> Andrew Pinski
> 
> >
> >   bp_pack_var_len_int (bp, ptr->x_flag_wrapv_pointer);
> >   bp_pack_var_len_int (bp, ptr->x_debug_nonbind_markers_p);
> >   bp_pack_var_len_int (bp, ptr->x_ix86_unroll_only_small_loops);
> >   for (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)
> >     bp_pack_value (bp, ptr->explicit_mask[i], 64);
> >
> > which I suppose can cause similar streaming issues ?
> >
> > This change was introduced recently in:
> >
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a59c4e496fa916cb9a484a64
> > 9aa1b4cebd6550f2
> >
> > Which perhaps explains recent failures I am seeing in libgomp tests
> for x86_64->nvptx offloading:
> >
> > FAIL: libgomp.fortran/target-print-1.f90   -O2  (internal compiler
> error: in lto_read_decls, at lto/lto-common.cc:1970)
> > FAIL:
> > libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-variables.c
> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none
> > -O2  (internal compiler error: in lto_read_decls, at
> > lto/lto-common.cc:1970)
> > FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-
> variables.c
> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none
> > -O2  (internal compiler error: in lto_read_decls, at
> > lto/lto-common.cc:1970)
> > FAIL: libgomp.oacc-fortran/print-1-nvptx.f90
> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none
> > -O2  (internal compiler error: in lto_read_decls, at
> > lto/lto-common.cc:1970)
> >
> > Thanks,
> > Prathamesh
> > >
> > > Richard.
> > >
> > > > >
> > > > > >
> > > > > > (4) AAarch64 uses OImode for 256-bit size array, which is
> not
> > > > > supported on nvptx:
> > > > > > This causes ~18 tests to fail.
> > > > > >
> > > > > > Can be reproduced with following simple test:
> > > > > > int main()
> > > > > > {
> > > > > >   long c[4];
> > > > > >   #pragma omp target map(c)
> > > > > >     c[0] = 0;
> > > > > >   return 0;
> > > > > > }
> > > > > >
> > > > > > Compiling with -O2 -fopenmp -foffload=nvptx-none results in:
> > > > > > lto1: fatal error: nvptx-none - 256-bit integer numbers
> > > > > > unsupported (mode 'OI') compilation terminated.
> > > > > > nvptx mkoffload: fatal error:
> > > > > > ../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-
> gcc
> > > > > returned 1 exit status compilation terminated.
> > > > > >
> > > > > > This happens with AArch64 host because, it uses OImode
> > > > > > (256-bit integer mode) for ARRAY_TYPE (long c[4] fits
> > > > > > 256-bits), which isn't supported on nvptx. This decision is
> > > > > > made during
> > > layout_type
> > > > > > for
> > > > > 'c', which calls mode_for_array, and mode_for_array uses
> target
> > > > > hooks array_mode and array_mode_supported_p to determine
> > > > > target-specific modes to use for ARRAY_TYPE.
> > > > > > For x86_64, AFAIK, it uses BLKmode for ARRAY_TYPE.
> > > > > >
> > > > > > I have attached a "quick-and-dirty" patch (p-166-2.diff)
> which
> > > > > > falls back to using BLKmode for ARRAY_TYPE if offloading is
> > > > > > enabled, and avoids streaming-out target-specific int modes
> in
> > > > > > lto_write_mode_table. I used default_scalar_mode_supported_p
> > > check
> > > > > to test if the int_mode is "generic", but not sure if that's
> > > > > entirely correct. The test compiles and runs OK with patch
> > > applied.
> > > > > I suppose a more general solution would be to somehow
> "intersect"
> > > > > available
> > > > > AArch64 modes with nvptx modes, and use those for offloading ?
> > > > >
> > > > > Hmm, I think we shouldn't stream modes for aggregates but
> > > > > instead let stor-layout re-assign them.  They are not usually
> > > > > expected to
> > > match.
> > > > > Not sure if we should do that generally or only for offload
> > > > > streaming though (it does have an overhead to recompute the
> mode).
> > > > > We could stream VOIDmode here as indicator it needs
> > > > > recomputation (splitting out sub-workers from layout_type,
> > > > > there's one already
> > > for
> > > > > record type).
> > > > Thanks, this indeed sounds better. I will try to work on patch
> > > following this approach.
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > > With local workarounds for the above 4 issues, running make
> > > check-
> > > > > target-libgomp shows following results:
> > > > > >
> > > > > >                 === libgomp Summary ===
> > > > > >
> > > > > > # of expected passes            16604
> > > > > > # of unexpected failures        10
> > > > > > # of expected failures          309
> > > > > > # of untested testcases         54
> > > > > > # of unresolved testcases       3
> > > > > > # of unsupported tests          643
> > > > > >
> > > > > > The remaining issues are:
> > > > > >
> > > > > > (5) "error: alias definitions not supported in this
> > > configuration"
> > > > > > Fails for pr96390.c, and pr96390.C. This seems to be related
> > > > > > to
> > > > > > https://gcc.gnu.org/PR97102
> > > > > >
> > > > > > (6) Execution Failures:
> > > > > > - libgomp/pr104783.c
> > > > > > - libgomp/pr104783-2.c
> > > > > > I haven't investigated these yet.
> > > > > >
> > > > > > (7) Several warnings fail for libgomp.oacc-c-c++-
> > > common/acc_prof-
> > > > > kernels-1.c and following excess errors:
> > > > > > acc_prof-kernels-1.c:185:9: optimized: assigned OpenACC seq
> > > > > > loop parallelism
> > > > > > acc_prof-kernels-1.c:214:9: optimized: assigned OpenACC seq
> > > > > > loop parallelism
> > > > > > acc_prof-kernels-1.c:245:9: optimized: assigned OpenACC seq
> > > > > > loop parallelism
> > > > > >
> > > > > > So far, I have only been testing make check-target-libgomp.
> > > Should
> > > > > > I
> > > > > be testing any additional parts of the testsuite for
> offloading
> > > > > changes ?
> > > > > >
> > > > > > My initial goals are:
> > > > > > (a) To get AArch64/nvptx offloading to work for above
> minimal
> > > test.
> > > > > > (b) Testsuite results for libgomp on par with x86_64 for non
> > > > > > VLA
> > > > > code (as far as correctness is concerned).
> > > > > > (c) After (a) and (b) are in place, try to enable support
> for
> > > > > offloading with VLA/SVE.
> > > > > >
> > > > > > I am planning to address these issues and will post patches
> > > > > > for the
> > > > > same shortly. I will be grateful for any feedback or
> suggestions
> > > on
> > > > > how to proceed forward.
> > > > > >
> > > > > > Thanks,
> > > > > > Prathamesh

Reply via email to