Re: [RFC] Enabling SVE with offloading to nvptx

Andrew Stubbs Thu, 02 Jan 2025 03:50:56 -0800

On 27/12/2024 12:29, Prathamesh Kulkarni wrote:

-----Original Message-----
From: Jakub Jelinek <[email protected]>
Sent: 17 December 2024 19:09
To: Prathamesh Kulkarni <[email protected]>
Cc: Andrew Stubbs <[email protected]>; Richard Biener
<[email protected]>; Richard Biener <[email protected]>;
[email protected]; Thomas Schwinge <[email protected]>
Subject: Re: [RFC] Enabling SVE with offloading to nvptx

External email: Use caution opening links or attachments


On Mon, Dec 02, 2024 at 11:17:08AM +0000, Prathamesh Kulkarni wrote:

--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -233,6 +233,12 @@ public:
       flag_finite_loops or similar pragmas state.  */
    unsigned finite_p : 1;

+  /* True if SIMD loop needs delayed lowering of artefacts like
+     safelen and length of omp simd arrays that depend on target's
+     max_vf.  This is true for offloading, when max_vf is computed

after

+     streaming out to device.  */
+  unsigned needs_max_vf_lowering: 1;


Consistency, finite_p above uses space before :, the above line
doesn't.

--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -7170,6 +7170,10 @@ expand_omp_simd (struct omp_region *region,

struct omp_for_data *fd)

        loop->latch = cont_bb;
        add_loop (loop, l1_bb->loop_father);
        loop->safelen = safelen_int;
+      loop->needs_max_vf_lowering = is_in_offload_region (region);
+      if (loop->needs_max_vf_lowering)
+     cfun->curr_properties &= ~PROP_gimple_lomp_dev;


Do you really need this for non-SVE arches?
I mean, could you not set loop->needs_max_vf_lowering if maximum
number of poly_int coeffs is 1?  Or if omp_max_vf returns constant or
something similar?

Well, I guess the issue is not really about VLA vectors but when host and 
device have
different max_vf, and selecting optimal max_vf is not really possible during 
omp-low/omp-expand,
since we don't have device's target info available at this point. Andrew's 
recent patch works around this
limitation by searching for "amdgcn" in OFFLOAD_TARGET_NAMES in omp_max_vf, but 
I guess a more general solution
would be to delay lowering max_vf after streaming-out to device irrespective of 
VLA/VLS vectors ?
For AArch64/nvptx offloading with SVE, where host is VLA and device is VLS, the 
issue is more pronounced (failing to compile),
compared to offloading from VLS host to VLS device (selecting sub-optimal 
max_vf).

That patch fixed a couple of cases. The name matching was only used forthe case where an oversized VF was harmless. The other case where makingthe VF too large would reserve excess memory was deferred to the devicecompiler.

In general, deferring decisions is probably a good idea, but it's notalways possible, or optimal, and in the above case it certainly wasn'tthe easy option. There's already precedent for doing the name match inthe SIMT VF code (for NVPTX), so it was easier and sufficient to do thesame.


Andrew

Re: [RFC] Enabling SVE with offloading to nvptx

Reply via email to