On Sat, 2 Nov 2024, Prathamesh Kulkarni wrote: > > > > -----Original Message----- > > From: Richard Biener <richard.guent...@gmail.com> > > Sent: 29 October 2024 16:46 > > To: Prathamesh Kulkarni <prathame...@nvidia.com> > > Cc: Richard Biener <rguent...@suse.de>; gcc@gcc.gnu.org; Thomas > > Schwinge <tschwi...@baylibre.com>; Jakub Jelinek <ja...@redhat.com> > > Subject: Re: [RFC] Enabling SVE with offloading to nvptx > > > > External email: Use caution opening links or attachments > > > > > > On Mon, Oct 28, 2024 at 1:52 PM Prathamesh Kulkarni via Gcc > > <gcc@gcc.gnu.org> wrote: > > > > > > > -----Original Message----- > > > > From: Richard Biener <rguent...@suse.de> > > > > Sent: 21 October 2024 12:45 > > > > To: Prathamesh Kulkarni <prathame...@nvidia.com> > > > > Cc: gcc@gcc.gnu.org; Thomas Schwinge <tschwi...@baylibre.com>; > > Jakub > > > > Jelinek <ja...@redhat.com> > > > > Subject: RE: [RFC] Enabling SVE with offloading to nvptx > > > > > > > > External email: Use caution opening links or attachments > > > > > > > > > > > > On Fri, 18 Oct 2024, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Richard Biener <rguent...@suse.de> > > > > > > Sent: 17 October 2024 19:18 > > > > > > To: Prathamesh Kulkarni <prathame...@nvidia.com> > > > > > > Cc: gcc@gcc.gnu.org; Thomas Schwinge <tschwi...@baylibre.com> > > > > > > Subject: RE: [RFC] Enabling SVE with offloading to nvptx > > > > > > > > > > > > External email: Use caution opening links or attachments > > > > > > > > > > > > > > > > > > On Thu, 17 Oct 2024, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Richard Biener <rguent...@suse.de> > > > > > > > > Sent: 16 October 2024 13:05 > > > > > > > > To: Prathamesh Kulkarni <prathame...@nvidia.com> > > > > > > > > Cc: gcc@gcc.gnu.org; Thomas Schwinge > > > > > > > > <tschwi...@baylibre.com> > > > > > > > > Subject: Re: [RFC] Enabling SVE with offloading to nvptx > > > > > > > > > > > > > > > > External email: Use caution opening links or attachments > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 15 Oct 2024, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > Testing libgomp with SVE enabled (-mcpu=generic+sve2), > > > > results > > > > > > in > > > > > > > > > ~60 > > > > > > > > UNRESOLVED errors with following error message: > > > > > > > > > > > > > > > > > > lto1: fatal error: degree of 'poly_int' exceeds > > > > > > 'NUM_POLY_INT_COEFFS' > > > > > > > > > compilation terminated. > > > > > > > > > nvptx mkoffload: fatal error: > > > > > > > > > ../../install/bin/aarch64-unknown-linux-gnu-accel-nvptx- > > > > none-g > > > > > > > > > cc > > > > > > > > returned 1 exit status compilation terminated. > > > > > > > > > > > > > > > > > > This behaviour can be reproduced with the following > > simple > > > > > > > > > test-case > > > > > > > > with -fopenmp -foffload=nvptx-none -mcpu=generic+sve2: > > > > > > > > > > > > > > > > > > #define N 1000 > > > > > > > > > int main () > > > > > > > > > { > > > > > > > > > int i; > > > > > > > > > int A[N] = {0}, B[N] = {0}; > > > > > > > > > > > > > > > > > > #pragma omp target map(i), map(tofrom: A), map(from: > > B) > > > > > > > > > #pragma omp simd > > > > > > > > > for (i = 0; i < N; i++) > > > > > > > > > A[i] = A[i] + B[i]; > > > > > > > > > return A[0]; > > > > > > > > > } > > > > > > > > > > > > > > > > > > omplower pass lowers the above loop to the following: > > > > > > > > > > > > > > > > > > D.4576 = .GOMP_USE_SIMT (); > > > > > > > > > if (D.4576 != 0) goto <D.4577>; else > > goto > > > > > > <D.4578>; > > > > > > > > > <D.4577>: > > > > > > > > > { > > > > > > > > > unsigned int D.4586; > > > > > > > > > unsigned int D.4587; > > > > > > > > > int D.4588; > > > > > > > > > void * simduid.5; > > > > > > > > > void * .omp_simt.6; > > > > > > > > > int D.4596; > > > > > > > > > _Bool D.4597; > > > > > > > > > int D.4598; > > > > > > > > > unsigned int D.4599; > > > > > > > > > int D.4600; > > > > > > > > > int D.4601; > > > > > > > > > int * D.4602; > > > > > > > > > int i [value-expr: D.4588]; > > > > > > > > > int i.0; > > > > > > > > > > > > > > > > > > simduid.5 = .GOMP_SIMT_ENTER > > (simduid.5, > > > > > > &D.4588); > > > > > > > > > .omp_simt.6 = .GOMP_SIMT_ENTER_ALLOC > > > > > > (simduid.5); > > > > > > > > > D.4587 = 0; > > > > > > > > > i.0 = 0; > > > > > > > > > #pragma omp simd safelen(32) > > > > > > > > > _simduid_(simduid.5) > > > > > > > > _simt_ linear(i.0:1) linear(i:1) > > > > > > > > > for (i.0 = 0; i.0 < 1000; i.0 = i.0 + > > 1) > > > > > > > > > ... > > > > > > > > > } > > > > > > > > > goto <D.4579>; > > > > > > > > > <D.4578>: > > > > > > > > > { > > > > > > > > > unsigned int D.4603; > > > > > > > > > unsigned int D.4604; > > > > > > > > > int D.4605[0:POLY_INT_CST [15, 16]]; > > > > > > > > > void * simduid.7; > > > > > > > > > unsigned int D.4612; > > > > > > > > > int * D.4613; > > > > > > > > > int D.4614; > > > > > > > > > int i [value-expr: D.4605[D.4604]]; > > > > > > > > > int i.0; > > > > > > > > > > > > > > > > > > D.4604 = 0; > > > > > > > > > i.0 = 0; > > > > > > > > > #pragma omp simd safelen(POLY_INT_CST > > > > > > > > > [16, > > > > > > 16]) > > > > > > > > _simduid_(simduid.7) linear(i.0:1) linear(i:1) > > > > > > > > > ... > > > > > > > > > } > > > > > > > > > <D.4579>: > > > > > > > > > ... > > > > > > > > > > > > > > > > > > For offloading to SIMT based device like nvptx, > > > > scan_omp_simd > > > > > > > > > duplicates lowering of simd pragma into if-else where > > the > > > > if- > > > > > > part > > > > > > > > > contains simt code-path, and else-part contains simd > > code- > > > > path. > > > > > > In > > > > > > > > lower_rec_simd_input_clauses, max_vf is set to 16+16x for > > > > > > > > the > > > > > > above > > > > > > > > case as determined by omp_max_vf, and that becomes length > > of > > > > the > > > > > > omp > > > > > > > > simd > > > > > > > > array: > > > > > > > > > int D.4605[0:POLY_INT_CST [15, 16]]; > > > > > > > > > > > > > > > > > > The issue here is that, the function containing above > > if- > > > > else > > > > > > > > > condition gets streamed out to LTO bytecode including > > the > > > > simd > > > > > > > > > code- > > > > > > > > path and the omp simd array, whose domain is > > > > [0:POLY_INT_CST[15, > > > > > > > > 16]], and thus we get the above error while streaming-in > > > > > > > > POLY_INT_CST in lto_input_ts_poly_tree_pointers on device > > > > side. > > > > > > > > > > > > > > > > > > Note that, the simd code-path is essentially dead-code > > on > > > > > > > > > nvptx, since > > > > > > > > > .GOMP_USE_SIMT() resolves to 1 during omp_device_lower > > > > > > > > > pass, and later optimization passes (ccp2) remove the > > > > > > > > > dead-code > > > > path > > > > > > > > > and unused omp > > > > > > > > simd arrays while compiling to device. So in this case, we > > > > > > > > aren't really mapping POLY_INT_CST from host to device, > > but > > > > > > > > it gets streamed out to device as an artefact of omp simd > > > > lowering. > > > > > > > > > > > > > > > > > > I suppose a proper fix here would be to (somehow) defer > > > > > > > > > lowering of omp pragma simd after streaming out to > > device, > > > > so > > > > > > > > > the device only sees simt code-path, and the host only > > > > > > > > > sees simd code path > > > > > > ? > > > > > > > > > Or perhaps > > > > > > > > clone each function in offload region, one for host and > > one > > > > for > > > > > > SIMT > > > > > > > > device, and only stream the device versions to avoid > > > > > > > > streaming out > > > > > > > > host- specific IR changes ? > > > > > > > > > > > > > > > > There is currently no way to have the host compiler query > > > > > > > > offload target capabilities so the only true fix is to > > delay > > > > OMP > > > > > > > > SIMD lowering to the target. > > > > > > > Um, I thought we could use omp_max_simt_vf from host to > > query > > > > > > > if the > > > > > > offload target is SIMT ? > > > > > > > The function essentially iterates thru OFFLOAD_TARGET_NAMES > > > > > > > and > > > > > > returns non-zero for nvptx. > > > > > > > > > > > > > > > > Are we dependent on the early optimization pipeline being > > > > > > > > run > > > > on > > > > > > the > > > > > > > > host to produce the offload IL? There's some oddball OACC > > > > > > > > passes > > > > > > in > > > > > > > > pass_ipa_oacc. > > > > > > > > > > > > > > > > That said, I'd probably try to produce clones with > > unlowered > > > > IL > > > > > > and > > > > > > > > skip those clones from all processing from that point and > > > > resume > > > > > > in > > > > > > > > the offload compiler. > > > > > > > > > > > > > > > > > I thought of following approaches as workarounds: > > > > > > > > > > > > > > > > I don't think any workaround will fly in the end. Can't > > you > > > > > > simply > > > > > > > > force SVE to be off for offload clones on the host side > > and > > > > > > > > force OMP lowering with ADVSIMD only? > > > > > > > Would it be correct to set: > > > > > > > sctx.max_vf = constant_lower_bound (omp_max_vf ()) > > > > > > > > > > > > > > if function is offloaded and omp_max_vf returns non-constant > > > > > > poly_int, > > > > > > > to force max_vf to be VLS, which will avoid VLA > > vectorization > > > > > > > as in > > > > > > the attached patch ? > > > > > > > > > > > > > > Or should we modify autovectorize_vector_modes hook to > > return > > > > VLS > > > > > > > modes for offloaded functions ? > > > > > > > > > > > > Can't you simply put a target(march=armv8.3a) (or even more > > > > > > basic > > > > > > ISA) on the OMP target clones? Or is the lowering happening > > > > before > > > > > > outlining? > > > > > AFAIU, scan_omp_target in omp-lower creates a "child function" > > but > > > > doesn't outline it. > > > > > The actual outlining seems to be happening during omp-expand > > pass. > > > > > > > > > > > In that case maybe we want to switch the host target into > > > > "offload- > > > > > > lowering-mode" (effectively switching to a basic ISA)? > > > > > Sorry, I didn't understand -- Do you mean we should enforce > > basic > > > > ISA for all the functions including host-only ones ? > > > > > > > > No, not for host-only. > > > > > > > > > Since the issue stems from having host details leaked into IL > > > > > during omp-lowering (and omp-expand), I was wondering if we > > should > > > > > use default versions of the hooks if offloading is enabled in > > > > > omp-lower > > > > and omp-expand, which should avoid host details creeping into > > > > offload IL, sth like in the attached patch ? > > > > > > > > Ick. No. I think we should avoid lowering parts applied when the > > > > pieces are not outlined yet and ensure those offload functions > > have > > > > a "basic ISA" (which the target needs to define) - the target > > > > already has control over the function decl built (IIRC) so it can > > > > attach required target attributes (and we can ensure those are > > > > dropped on the offload compiler side again). > > > Hi Richard, > > > Thanks for the suggestions! > > > > > > From what I understand, omp expansion proceeds in depth-first order, > > > by expanding innermost regions first and then progressively > > outlining > > > them. I tried disabling SIMD lowering during omp-lowering but that > > triggered several errors during omp-expand, and didn't attempt to > > pursue it further. > > > > > > I briefly experimented with adding a new target hook offload_mode, > > > that will set host ISA to "offload mode", for current_function_decl > > > (function containing offload region) when starting to lower offload > > > region, and reset the host ISA on current_function_decl after > > leaving it (essentially enabling/disabling TARGET_FL_SVE in isa_flags > > in AArch64 backend), but not sure if that was entirely correct, and > > dropped it. > > > > > > > > The alternative would be to go with something along your patch in > > > > this mail but do > > > > > > > > poly_uint64 > > > > -omp_max_vf (void) > > > > +omp_max_vf (bool offload) > > > > { > > > > ... > > > > if (offload) > > > > get_me_magic_value_for_configured_offload_device () > > > > > > > > but iff we're doing the lowering before outlining then if we have > > > > configured two different offload devices we'll have to come up > > with > > > > a max_vf that's suitable for all offload devices? I think this > > > > shows a least this part of the lowering is done in the wrong > > place? > > > > > > > > Maybe we can have the max_vf "symbolic"? Like .IFN_MAX_VF () and > > > > lower that on the offload side only? > > > AFAIU, there are three things that depend on max_vf: > > > (1) Setting loop->safelen > > > (2) Setting length of omp simd arrays > > > (3) Computing chunk_size for schedule clause (with simd modifier) > > > > > > We can't use result of an internal function for loop->safelen since > > it's a compile-time artefact. > > > > True. > > > > > And for array length, (at-least) sra pass seems to assume that > > TYPE_MIN/TYPE_MAX are INTEGER_CST. > > > From prepare_iteration_over_array_elts: > > > > > > tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (type)); > > > gcc_assert (TREE_CODE (minidx) == INTEGER_CST); > > > tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (type)); > > > /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - > > 1. */ > > > if (!maxidx) > > > return false; > > > gcc_assert (TREE_CODE (maxidx) == INTEGER_CST); > > > > The issue might be that local vars cannot be "VLA", instead they would > > be lowered to be allocated by alloca(). That means the assertion is > > technically correct. > > > > > The attached patch: > > > (a) Uses a placeholder value (INT_MAX) for max_vf which gets > > assigned > > > to loop->safelen and length of omp simd array if offloading is > > enabled. > > > > I think it's better to identify the loop that needs "max_vf lowering" > > with a new flag and set a conservative value to max_vf as it could be > > interpreted and used for invalid optimization otherwise. > The attached patch adds a new bitfield needs_max_vf_lowering to loop, and > sets that in expand_omp_simd for loops that need > delayed lowering of safelen and omp simd arrays. The patch defines a new > macro OMP_COMMON_MAX_VF (arbitrarily set to 16), > as a placeholder value for max_vf (instead of INT_MAX), and is later replaced > by appropriate max_vf during omp_adjust_max_vf pass. > Does that look OK ? > > > > For the SIMD array size I have no good suggestions - the uses are very > > constrained though, so I suspect any magic value that's not 0 or 1 > > might work. > > > > > (b) For computing chunk_size, using INT_MAX resulted in chunk_size > > > being constant propagated (and harder to recover later), so I added > > a new internal function .GOMP_SIMD_MAX_VF, whose result is assigned to > > chunk_size. > > > > I guess that's OK. > > > > > (c) Adds a new pass pass_omp_adjust_max_vf, just before > > > omp_device_lower, to adjust above 3 artefacts to correctly adjust > > > max_vf, and fold away .GOMP_SIMD_MAX_VF (or should I piggy back it > > on > > > some other pass?) > > > > It does look related enough to pass_omp_device_lower, no? > Well, that's where I put it initially, but the pass has a more stronger > condition for gating: > return (!(fun->curr_properties & PROP_gimple_lomp_dev) > || (flag_openmp > && (cgraph_node::get (fun->decl)->calls_declare_variant_alt > || offload_ind_funcs_p))); > > Which I am not sure will trigger for every offloaded function on both host > and device ? > I am gating the pass simply on offloading_function_p, since we need to lower > SIMD constructs on both host and device for every function > that is offloaded, containing SIMD loops. > > > > I hope Jakub can chime in a bit. > > > > > For the following contrived test: > > > > > > #include <stdlib.h> > > > > > > #define N 1000 > > > int A[N]; > > > int B[N]; > > > > > > int main() > > > { > > > int i; > > > int sum = 0; > > > > > > #pragma omp target map(sum), map(A), map(B), map (i) > > > #pragma omp teams distribute parallel for simd reduction(+:sum) > > schedule(simd:static, 5) > > > for (i = 0; i < N; i++) > > > sum += A[i] * B[i]; > > > return sum; > > > } > > > > > > With patch, omp expand dump shows length of omp simd arrays set to > > > INT_MAX with offloading > > > enabled: > > > int D.5382[2147483647]; > > > int D.5378[2147483647]; > > > > > > and following computation for chunk_size: > > > D.5353 = .GOMP_SIMD_MAX_VF (); > > > D.5354 = D.5353 + 4; > > > D.5355 = .GOMP_SIMD_MAX_VF (); > > > D.5356 = -D.5355; > > > D.5357 = D.5354 & D.5356; > > > D.5358 = D.5348 - D.5347; > > > > > > And after omp_adjust_max_vf pass, the dump shows correct max_vf > > > assigned to length of omp simd array and chunk_size on host side: > > > > > > int D.5382[0:POLY_INT_CST [15, 16]]; > > > int D.5378[0:POLY_INT_CST [15, 16]]; > > > ... > > > _38 = POLY_INT_CST [16, 16]; > > > _39 = _38 + 4; > > > _40 = POLY_INT_CST [16, 16]; > > > _41 = -_40; > > > _42 = _39 & _41; > > > _43 = _35 - _34; > > > > > > and would make similar adjustments for SIMD based devices. > > > For SIMT devices, the patch explicitly sets max_vf to 1, to fold > > > .GOMP_SIMD_MAX_VF and shrink omp simd array (altho I guess it > > doesn't > > > really matter since the simd code-path would be dead-code?) > > > > > > I had a couple of questions: > > > > > > (1) With patch, I am seeing a lot of errors -- "multiple dump files > > found" > > > For eg: > > > libgomp.c++/../libgomp.c-c++-common/target-is-initial-host-2.c: > > > multiple dump files found > > > UNRESOLVED: > > > libgomp.c++/../libgomp.c-c++-common/target-is-initial-host-2.c > > > scan-nvptx-none-offload-tree-dump-times optimized > > > "omp_is_initial_device" 1 > > > > > > The error seems to come from scandump.exp:glob-dump-file: > > > > > > if { $num_files > 1 } { > > > verbose -log "$testcase: multiple dump files found" > > > } > > > > > > This seems to happen because the compiler is passed: > > > -fdump-tree-optimized -foffload-options=-fdump-tree-optimized, which > > results in two optimized dump files. I am not sure tho, why the patch > > specifically triggers this ? > > > > I think the general scan-tree-dump* do not work here, IIRC there was > > work to add offload and lto dump scan variants. > This went away after updating the sources and doing a clean build, I had > possibly screwed up my build dir. > > > > > (2) To check if we're in offload region in omp_adjust_chunk_size, > > the > > > patch defines a new function enclosing_target_region_p, and keeps > > > walking region->outer till it reaches GIMPLE_OMP_TARGET region (or > > > NULL). While it seems to work, I was wondering if there was a better > > > way to do this ? Another option would be to keep a static variable > > target_nesting_level, which is incremented/decremented before/after > > each call to omp_expand_target, similar to one in omp-lower.cc ? I > > tried using omp_maybe_offloaded from omp_adjust_chunk_size, but that > > didn't seem to work. > > > > > > Does the patch look in the right direction ? > > > > I think yes, but I lack the overall OMP lowering picture here (and > > still think we should eventually delay offload target lowering to the > > offload compile and only outline regions on the host) > Right, I initially thought the issue was about if-else lowering of SIMD vs > SIMT, but it's really about host details creeping earlier in offload IL > during omp simd lowering/expansion. > For instance, offloading from AArch64 host with SVE enabled to a SIMD-based > device will also result in same issue above of mismatched degree of > POLY_INT_CST if device's NUM_POLY_INT_COEFFS < 2 ? > > I suppose tho, we don't need to delay all the SIMD lowering constructs > after streaming, but only those parts that are target dependent to avoid > discrepancies in offload IL ? I grepped thru omp-lower and omp-expand, > and it seems to me, in addition to max_vf, only the alignment clause > uses autovectorize_vector_modes, simd_preferred_mode and related_mode > hooks to determine the biggest alignment supported by the target for > vector types. The attached patch thus also adds a new internal function > for representing alignment, and is lowered during omp_adjust_max_vf.
I think the target/host details are only exposed when the "vector decls" are introduced - I do wonder whether that part of the lowering can be delayed, but as said - I hope Jakub would chime in here. > Passes libgomp testing for Aarch64/nvptx offloading (with and without GPU). > Does the patch look OK ? Also for an eye on the patch. Richard. > Thanks, > Prathamesh > > > > Richard. > > > > > Thanks, > > > Prathamesh > > > > > > > > Richard. > > > > > > > > > > > > > Thanks, > > > > > Prathamesh > > > > > > > > > > > > > Thanks, > > > > > > > Prathamesh > > > > > > > > > > > > > > > > Richard. > > > > > > > > > > > > > > > > > [1] Set sctx.max_vf to constant_lower_bound(omp_max_vf > > ()) > > > > in > > > > > > > > > lower_rec_simd_input_clauses, if the function is going > > to > > > > > > > > > be offloaded and omp_max_vf returns non-constant > > poly_int. > > > > > > > > > For > > > > > > above > > > > > > > > > case, it sets > > > > > > > > max_vf to 16 instead of 16+16x which seems to resolve the > > > > issue, > > > > > > but > > > > > > > > it'd use suboptimal max VF for host ? This is done in > > patch > > > > > > > > p-283- > > > > > > 2.txt. > > > > > > > > > > > > > > > > > > However, with clean trunk it still seems to use max_vf = > > > > > > > > > 16 > > > > > > after > > > > > > > > disabling the above error. > > > > > > > > > vect dump shows: > > > > > > > > > > > > > > > > > > (compute_affine_dependence > > > > > > > > > ref_a: (*_25)[i.0_51], stmt_a: _26 = (*_25)[i.0_51]; > > > > > > > > > ref_b: (*_23)[i.0_51], stmt_b: (*_23)[i.0_51] = _27; > > > > > > > > > ) -> dependence analysis failed > > > > > > > > > foo.c:10:13: note: dependence distance = 0. > > > > > > > > > foo.c:10:13: note: dependence distance == 0 between > > > > > > (*_23)[i.0_51] > > > > > > > > and (*_23)[i.0_51] > > > > > > > > > foo.c:10:13: missed: bad data dependence. > > > > > > > > > foo.c:10:13: note: ***** Analysis failed with vector > > mode > > > > > > VNx4SI > > > > > > > > > > > > > > > > > > This seems to happen because, loop->safelen is set to 16 > > > > > > > > > by > > > > > > taking > > > > > > > > > MIN(constant_lower_bound(16+16x), INT_MAX) in > > > > expand_omp_simd: > > > > > > > > > > > > > > > > > > if (!poly_int_tree_p (safelen, &val)) > > > > > > > > > safelen_int = 0; > > > > > > > > > else > > > > > > > > > safelen_int = MIN (constant_lower_bound (val), > > > > > > > > > INT_MAX); > > > > > > > > > > > > > > > > > > and fails to vectorize with VLA vectors, because max_vf > > == > > > > 16 > > > > > > and > > > > > > > > min_vf == 4+4x resulting in bad data dependence due to: > > > > > > > > > > > > > > > > > > if (max_vf != MAX_VECTORIZATION_FACTOR > > > > > > > > > && maybe_lt (max_vf, min_vf)) > > > > > > > > > return opt_result::failure_at (vect_location, "bad > > > > > > > > > data dependence.\n"); > > > > > > > > > > > > > > > > > > If safelen was (somehow) set to 16+16x, I guess it could > > > > have > > > > > > used > > > > > > > > VF=4+4x and vectorized with VLA vectors. > > > > > > > > > but I suppose that's a separate issue ? > > > > > > > > > > > > > > > > > > [2] Since the issue seems to be only with streaming out > > > > length > > > > > > of > > > > > > > > > omp simd array when it's POLY_INT_CST, could we perhaps > > > > > > > > > use > > > > a > > > > > > > > > place holder length during omp lowering and compute the > > > > > > > > > correct length after streaming out, so POLY_INT_CST > > > > > > > > > doesn't get leaked into bytecode ? The > > > > > > > > attached patch p-283-3.txt follows this approach by using > > > > bogus > > > > > > > > length INT_MAX in lower_rec_simd_input_clauses if > > offloading > > > > to > > > > > > SIMT > > > > > > > > device and max_vf is non-constant poly_int, and later > > > > computing > > > > > > the > > > > > > > > correct length in beginning of vect pass by setting it to > > > > > > omp_max_vf > > > > > > > > (), but I am not sure if this is entirely correct. > > > > > > > > > I am assuming that creating omp simd array of bogus > > length > > > > > > > > > will not be an issue for nvptx since it will never get > > > > > > > > > referenced and eventually be removed by > > > > > > > > > remove_unused_locals > > > > ? > > > > > > > > > If it'd not be a good idea to > > > > > > > > rely on the pass pipeline to eliminate simd code-path and > > > > > > > > omp simd array while compiling to device, it could be > > > > > > > > possibly > > > > done > > > > > > > > during omp_lower_device pass itself ? > > > > > > > > > > > > > > > > > > [3] While streaming-in POLY_INT_CST, avoid emitting > > error > > > > > > > > > immediately if degree of POLY_INT_CST exceeds accel's > > > > > > > > > NUM_POLY_INT_COEFFS to ignore POLY_INT_CSTs that may > > > > > > > > > potentially occur on dead-code path, and instead mark it > > > > > > > > > as > > > > error_mark_node. > > > > > > > > > For the above case, since > > > > > > > > POLY_INT_CST appears on dead-code path, streaming > > > > > > > > POLY_INT_CST > > > > > > with > > > > > > > > higher degree than accel's NUM_POLY_INT_COEFFS would be > > > > > > "harmless". > > > > > > > > And detect invalid POLY_INT_CST's in expand pass (if it > > > > survives > > > > > > > > till this point), and emit above error, but not sure if > > > > > > > > that'd be the right place ? > > > > > > > > > This is done in p-283-4.txt. > > > > > > > > > > > > > > > > > > All the three patches fix UNRESOLVED tests due to > > > > POLY_INT_CST > > > > > > > > streaming error in libgomp testsuite with - > > mcpu=generic+sve2. > > > > > > > > > (Altho it introduces a strange FAIL for data-5.f90, > > which > > > > > > > > > I > > > > am > > > > > > > > investigating). > > > > > > > > > I would be grateful for suggestions on how to proceed. > > > > > > > > > > > > > > > > > > Signed-off-by: Prathamesh Kulkarni > > > > > > > > > <prathame...@nvidia.com> > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Prathamesh > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Richard Biener <rguent...@suse.de> SUSE Software Solutions > > > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, > > Germany; > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB > > 36809, > > > > AG > > > > > > > > Nuernberg) > > > > > > > > > > > > > > > > > > > -- > > > > > > Richard Biener <rguent...@suse.de> SUSE Software Solutions > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, > > AG > > > > > > Nuernberg) > > > > > > > > > > > > > -- > > > > Richard Biener <rguent...@suse.de> > > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > > > Nuernberg, Germany; > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > > Nuernberg) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)