Hi! On 2023-09-08T11:04:24+0200, Tobias Burnus <tob...@codesourcery.com> wrote: > On 02.08.23 19:00, Andrew Stubbs wrote: >> The use of the PTX dynamic_smem_size feature means that low-latency allocator >> will not work with the PTX 3.1 multilib.
Right: PTX '%dynamic_smem_size' was "Introduced in PTX ISA version 4.1", and "Requires 'sm_20' or higher". > - if I understand it correctly, our default build supports sm_30 and > uses PTX ISA version 3.1 for it. It's correct that we currently still build a '-march=sm_30', '-mptx=3.1' multilib, but it's not the default one (the user has to explicitly request it with '-mptx=3.1' on the GCC command line). The default one had first been raised to '-mptx=6.3': commit 8ff0669f6d1d6126b7c010da02fa6532abb5e1ca Author: Tom de Vries <tdevr...@suse.de> AuthorDate: Wed Jan 26 14:17:40 2022 +0100 Commit: Tom de Vries <tdevr...@suse.de> CommitDate: Tue Feb 1 19:28:52 2022 +0100 [nvptx] Update default ptx isa to 6.3 [...] --- gcc/config/nvptx/nvptx.opt +++ gcc/config/nvptx/nvptx.opt @@ -91,3 +91,3 @@ Enum(ptx_version) String(7.0) Value(PTX_VERSION_7_0) mptx= -Target RejectNegative ToLower Joined Enum(ptx_version) Var(ptx_version_option) Init(PTX_VERSION_3_1) +Target RejectNegative ToLower Joined Enum(ptx_version) Var(ptx_version_option) Init(PTX_VERSION_6_3) Specify the version of the ptx version to use. [...] ..., and then again lowered to '-mptx=6.0': commit decde11183bdccc46587d6614b75f3d56a2f2e4a Author: Tom de Vries <tdevr...@suse.de> AuthorDate: Fri Feb 4 08:53:52 2022 +0100 Commit: Tom de Vries <tdevr...@suse.de> CommitDate: Tue Feb 8 13:55:23 2022 +0100 [nvptx] Choose -mptx default based on -misa [...] --- gcc/config/nvptx/nvptx.cc +++ gcc/config/nvptx/nvptx.cc [...] +static enum ptx_version +default_ptx_version_option (void) +{ + [...] + /* Pick at least 6.0, to enable using bar.warp.sync to have a way to force + warp convergence. */ + res = MAX (res, PTX_VERSION_6_0); + [...] +} [...] --- gcc/config/nvptx/nvptx.opt +++ gcc/config/nvptx/nvptx.opt @@ -91,3 +91,3 @@ Enum(ptx_version) String(7.0) Value(PTX_VERSION_7_0) mptx= -Target RejectNegative ToLower Joined Enum(ptx_version) Var(ptx_version_option) Init(PTX_VERSION_6_3) +Target RejectNegative ToLower Joined Enum(ptx_version) Var(ptx_version_option) Specify the version of the ptx version to use. > If so, I think we should mention that > nvptx GCC has to be configured with with-arch=sm_... >= sm_53 (= > supported version >=4.1) and, during compilation, no -march= < that > configure-time value may be specified. (Cf. also > https://gcc.gnu.org/install/specific.html#nvptx-x-none ) Given that GCC/nvptx generally supports 'sm_20', only the PTX ISA version matters here, and that's all fine if just using GCC's defaults. OK to push "Clarify libgomp nvptx 'omp_low_lat_mem_space' documentation", see attached? Grüße Thomas
>From a6cdf1358c8b2f5279517ec7ebeb3336299ea928 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <tschwi...@baylibre.com> Date: Tue, 12 Nov 2024 09:54:35 +0100 Subject: [PATCH] Clarify libgomp nvptx 'omp_low_lat_mem_space' documentation PTX '%dynamic_smem_size' was "Introduced in PTX ISA version 4.1", and "Requires 'sm_20' or higher". Given that GCC/nvptx generally supports 'sm_20', only the PTX ISA version matters here, and that's all fine if just using GCC's defaults. Follow-up to commit e9a19ead498fcc89186b724c6e76854f7751a89b "openmp, nvptx: low-lat memory access traits". libgomp/ * libgomp.texi: Clarify nvptx 'omp_low_lat_mem_space' documentation. --- libgomp/libgomp.texi | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 6679f6da4b9..9602d70f26e 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -6968,8 +6968,10 @@ The implementation remark: memory-copy functions of the CUDA library. Higher dimensions will call those functions in a loop and are therefore supported. @item Low-latency memory (@code{omp_low_lat_mem_space}) is supported when the - the @code{access} trait is set to @code{cgroup}, the ISA is at least - @code{sm_53}, and the PTX version is at least 4.1. The default pool size + the @code{access} trait is set to @code{cgroup}, and libgomp has + been built for PTX ISA version 4.1 or higher (such as in GCC's + default configuration). @c -mptx=4.1 + The default pool size is 8 kiB per team, but may be adjusted at runtime by setting environment variable @code{GOMP_NVPTX_LOWLAT_POOL=@var{bytes}}. The maximum value is limited by the available hardware, and care should be taken that the -- 2.34.1