Hi!

On 2023-09-08T11:04:24+0200, Tobias Burnus <tob...@codesourcery.com> wrote:
> On 02.08.23 19:00, Andrew Stubbs wrote:
>> The use of the PTX dynamic_smem_size feature means that low-latency allocator
>> will not work with the PTX 3.1 multilib.

Right: PTX '%dynamic_smem_size' was "Introduced in PTX ISA version 4.1",
and "Requires 'sm_20' or higher".

> - if I understand it correctly, our default build supports sm_30 and
> uses PTX ISA version 3.1 for it.

It's correct that we currently still build a '-march=sm_30', '-mptx=3.1'
multilib, but it's not the default one (the user has to explicitly
request it with '-mptx=3.1' on the GCC command line).  The default one
had first been raised to '-mptx=6.3':

    commit 8ff0669f6d1d6126b7c010da02fa6532abb5e1ca
    Author:     Tom de Vries <tdevr...@suse.de>
    AuthorDate: Wed Jan 26 14:17:40 2022 +0100
    Commit:     Tom de Vries <tdevr...@suse.de>
    CommitDate: Tue Feb 1 19:28:52 2022 +0100
    
        [nvptx] Update default ptx isa to 6.3
    [...]
    --- gcc/config/nvptx/nvptx.opt
    +++ gcc/config/nvptx/nvptx.opt
    @@ -91,3 +91,3 @@ Enum(ptx_version) String(7.0) Value(PTX_VERSION_7_0)
     mptx=
    -Target RejectNegative ToLower Joined Enum(ptx_version) 
Var(ptx_version_option) Init(PTX_VERSION_3_1)
    +Target RejectNegative ToLower Joined Enum(ptx_version) 
Var(ptx_version_option) Init(PTX_VERSION_6_3)
     Specify the version of the ptx version to use.
    [...]

..., and then again lowered to '-mptx=6.0':

    commit decde11183bdccc46587d6614b75f3d56a2f2e4a
    Author:     Tom de Vries <tdevr...@suse.de>
    AuthorDate: Fri Feb 4 08:53:52 2022 +0100
    Commit:     Tom de Vries <tdevr...@suse.de>
    CommitDate: Tue Feb 8 13:55:23 2022 +0100
    
        [nvptx] Choose -mptx default based on -misa
    [...]
    --- gcc/config/nvptx/nvptx.cc
    +++ gcc/config/nvptx/nvptx.cc
    [...]
    +static enum ptx_version
    +default_ptx_version_option (void)
    +{
    +  [...]
    +  /* Pick at least 6.0, to enable using bar.warp.sync to have a way to 
force
    +     warp convergence.  */
    +  res = MAX (res, PTX_VERSION_6_0);
    +  [...]
    +}
    [...]
    --- gcc/config/nvptx/nvptx.opt
    +++ gcc/config/nvptx/nvptx.opt
    @@ -91,3 +91,3 @@ Enum(ptx_version) String(7.0) Value(PTX_VERSION_7_0)
     mptx=
    -Target RejectNegative ToLower Joined Enum(ptx_version) 
Var(ptx_version_option) Init(PTX_VERSION_6_3)
    +Target RejectNegative ToLower Joined Enum(ptx_version) 
Var(ptx_version_option)
     Specify the version of the ptx version to use.

> If so, I think we should mention that
> nvptx GCC has to be configured with with-arch=sm_... >= sm_53 (=
> supported version >=4.1) and, during compilation, no -march= < that
> configure-time value may be specified. (Cf. also
> https://gcc.gnu.org/install/specific.html#nvptx-x-none )

Given that GCC/nvptx generally supports 'sm_20', only the PTX ISA version
matters here, and that's all fine if just using GCC's defaults.  OK to push
"Clarify libgomp nvptx 'omp_low_lat_mem_space' documentation", see
attached?


Grüße
 Thomas


>From a6cdf1358c8b2f5279517ec7ebeb3336299ea928 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tschwi...@baylibre.com>
Date: Tue, 12 Nov 2024 09:54:35 +0100
Subject: [PATCH] Clarify libgomp nvptx 'omp_low_lat_mem_space' documentation

PTX '%dynamic_smem_size' was "Introduced in PTX ISA version 4.1", and
"Requires 'sm_20' or higher".  Given that GCC/nvptx generally supports
'sm_20', only the PTX ISA version matters here, and that's all fine if
just using GCC's defaults.  Follow-up to
commit e9a19ead498fcc89186b724c6e76854f7751a89b
"openmp, nvptx: low-lat memory access traits".

	libgomp/
	* libgomp.texi: Clarify nvptx 'omp_low_lat_mem_space'
	documentation.
---
 libgomp/libgomp.texi | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 6679f6da4b9..9602d70f26e 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6968,8 +6968,10 @@ The implementation remark:
       memory-copy functions of the CUDA library.  Higher dimensions will
       call those functions in a loop and are therefore supported.
 @item Low-latency memory (@code{omp_low_lat_mem_space}) is supported when the
-      the @code{access} trait is set to @code{cgroup}, the ISA is at least
-      @code{sm_53}, and the PTX version is at least 4.1.  The default pool size
+      the @code{access} trait is set to @code{cgroup}, and libgomp has
+      been built for PTX ISA version 4.1 or higher (such as in GCC's
+      default configuration).  @c -mptx=4.1
+      The default pool size
       is 8 kiB per team, but may be adjusted at runtime by setting environment
       variable @code{GOMP_NVPTX_LOWLAT_POOL=@var{bytes}}.  The maximum value is
       limited by the available hardware, and care should be taken that the
-- 
2.34.1

Reply via email to