On August 24, 2024 Tobias Burnus wrote:
[...] it documents the code added at "[patch][rfc] libgomp: Add OpenMP interop support to nvptx + gcn plugin", https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661207.html
Quite some time has passed and those features are now on mainline. The attached patch is an updated version of it, also documenting the settings used to create the stream/queue object, which is mainly relevant for HSA - as the CUDA and HIP versions are pretty standard. - I think everything else is also pretty standard. BTW: For HIP on AMD, I assume that when HSA is found via dlopen, also HIP will be found via dlopen and shy away from wording like 'if available/found at runtime' or similar. Comments before I commit it? I bet someone has! Tobias
libgomp.texi: Document supported OpenMP 'interop' types for nvptx and gcn libgomp/ChangeLog: * libgomp.texi (OpenMP 5.1): Add @ref to offload-target specifics for 'interop'. (OpenMP 6.0): Mark dispatch's interop clause as implemented. (omp_get_interop_int, omp_get_interop_str, omp_get_interop_ptr, omp_get_interop_type_desc): Add @ref to Offload-Target Specifics. (Offload-Target Specifics): Document the supported OpenMP interop foreign runtimes on AMD and Nvidia GPUs. libgomp/libgomp.texi | 152 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 146 insertions(+), 6 deletions(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index d1cf9be47ca..04e7ed2352c 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -314,7 +314,7 @@ The OpenMP 4.5 specification is fully supported. clauses @tab N @tab @item Indirect calls to the device version of a procedure or function in @code{target} regions @tab Y @tab -@item @code{interop} directive @tab N @tab +@item @code{interop} directive @tab Y @tab Cf. @ref{Offload-Target Specifics} @item @code{omp_interop_t} object support in runtime routines @tab Y @tab @item @code{nowait} clause in @code{taskwait} directive @tab Y @tab @item Extensions to the @code{atomic} directive @tab Y @tab @@ -545,7 +545,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab @tab N @tab @item Semicolon-separated list to @code{uses_allocators} @tab N @tab @item New @code{need_device_addr} modifier to @code{adjust_args} clause @tab N @tab -@item @code{interop} clause to @code{dispatch} @tab N @tab +@item @code{interop} clause to @code{dispatch} @tab Y @tab @item Scope requirement changes for @code{declare_target} @tab N @tab @item @code{message} and @code{severity} clauses to @code{parallel} directive @tab N @tab @@ -3062,7 +3062,8 @@ the initial device is unspecified. @end multitable @item @emph{See also}: -@ref{omp_get_interop_ptr}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc} +@ref{omp_get_interop_ptr}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc}, +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.2, @@ -3107,7 +3108,8 @@ the initial device is unspecified. @end multitable @item @emph{See also}: -@ref{omp_get_interop_int}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc} +@ref{omp_get_interop_int}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc}, +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.3, @@ -3151,7 +3153,8 @@ the initial device is unspecified. @end multitable @item @emph{See also}: -@ref{omp_get_interop_int}, @ref{omp_get_interop_ptr}, @ref{omp_get_interop_rc_desc} +@ref{omp_get_interop_int}, @ref{omp_get_interop_ptr}, @ref{omp_get_interop_rc_desc}, +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.4, @@ -3234,7 +3237,8 @@ a null pointer is returned. The effect of running this routine in a @end multitable @item @emph{See also}: -@ref{omp_get_num_interop_properties}, @ref{omp_get_interop_name} +@ref{omp_get_num_interop_properties}, @ref{omp_get_interop_name}, +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.6, @@ -6837,6 +6841,10 @@ The following sections present notes on the offload-target specifics @node AMD Radeon @section AMD Radeon (GCN) +@menu +* Foreign-runtime support for AMD GPUs:: +@end menu + On the hardware side, there is the hierarchy (fine to coarse): @itemize @item work item (thread) @@ -6912,10 +6920,69 @@ The implementation remark: @end itemize +@node Foreign-runtime support for AMD GPUs +@subsection OpenMP @code{interop} -- Foreign-Runtime Support for AMD GPUs + +On AMD GPUs, the foreign runtimes are HIP (C++ Heterogeneous-Compute Interface +for Portability) and HSA (Heterogeneous System Architecture), +where HIP is the default. The interop object is created using OpenMP's +@code{interop} directive or, implicitly, when invoking a @code{declare variant} +procedure that has the @code{append_args} clause. In either case, the +@code{prefer_type} modifier determines whether HIP or HSA is used. + +When specifying the @code{targetsync} modifier: For HIP, a stream is +created using @code{hipStreamCreate}. For HSA, a queue is created of type +@code{HSA_QUEUE_TYPE_MULTI} with a queue size of 64. + +Invoke the @ref{Interoperability Routines} on an interop object to obtain +the following properties. For properties with integral (int), pointer (ptr), +or string (str) data type, call @code{omp_get_interop_int}, +@code{omp_get_interop_ptr}, or @code{omp_get_interop_str}, respectively. +To each listed property, an associated named constant exists with prefix +@code{omp_ipr_}. Note that @code{device_num} is the OpenMP device number +while @code{device} is the HIP device number or HSA device handle. + +@noindent +Available properties for an HIP interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_hip} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{hip} +@item @code{vendor} @tab @code{int} @tab int @tab @code{1} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{amd} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{hipDevice_t} @tab int @tab +@item @code{device_context} @tab @code{hipCtx_t} @tab ptr @tab +@item @code{targetsync} @tab @code{hipStream_t} @tab ptr @tab +@end multitable + +@noindent +Available properties for an HSA interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_hsa} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{hsa} +@item @code{vendor} @tab @code{int} @tab int @tab @code{1} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{amd} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{hsa_agent *} @tab ptr @tab +@item @code{device_context} @tab N/A @tab @tab +@item @code{targetsync} @tab @code{hsa_queue *} @tab ptr @tab +@end multitable + + @node nvptx @section nvptx +@menu +* Foreign-runtime support for Nvidia GPUs:: +@end menu + On the hardware side, there is the hierarchy (fine to coarse): @itemize @item thread @@ -7008,6 +7075,79 @@ The implementation remark: @end itemize +@node Foreign-runtime support for Nvidia GPUs +@subsection OpenMP @code{interop} -- Foreign-Runtime Support for Nvidia GPUs + +On Nvidia GPUs, the foreign runtimes APIs are the CUDA runtime API, the CUDA +driver API, and HIP, the C++ Heterogeneous-Compute Interface for Portability +that is---on CUDA-based systems---a very thin layer on top of the CUDA API. By +default, CUDA is used. The interop object is created using OpenMP's +@code{interop} directive or, implicitly, when invoking a @code{declare variant} +procedure that has the @code{append_args} clause. In either case, the +@code{prefer_type} modifier determines whether CUDA, CUDA driver, or HSA is +used. + +When specifying the @code{targetsync} modifier, a CUDA stream is created using +the @code{CU_STREAM_DEFAULT} flag. + +Invoke the @ref{Interoperability Routines} on an interop object to obtain +the following properties. For properties with integral (int), pointer (ptr), +or string (str) data type, call @code{omp_get_interop_int}, +@code{omp_get_interop_ptr}, or @code{omp_get_interop_str}, respectively. +To each listed property, an associated named constant exists with prefix +@code{omp_ipr_}. Note that @code{device_num} is the OpenMP device number +while @code{device} is the CUDA, CUDA Driver, or HIP device number. + +@noindent +Available properties for a CUDA runtime API interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_cuda} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{cuda} +@item @code{vendor} @tab @code{int} @tab int @tab @code{11} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{nvidia} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{int} @tab int @tab +@item @code{device_context} @tab N/A @tab @tab +@item @code{targetsync} @tab @code{cudaStream_t} @tab ptr @tab +@end multitable + +@noindent +Available properties for a CUDA driver API interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_cuda_driver} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{cuda_driver} +@item @code{vendor} @tab @code{int} @tab int @tab @code{11} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{nvidia} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{CUdevice} @tab int @tab +@item @code{device_context} @tab @code{CUcontext} @tab ptr @tab +@item @code{targetsync} @tab @code{CUstream} @tab ptr @tab +@end multitable + +@noindent +Available properties for an HIP interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_hip} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{hip} +@item @code{vendor} @tab @code{int} @tab int @tab @code{11} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{nvidia} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{hipDevice_t} @tab int @tab +@item @code{device_context} @tab @code{hipCtx_t} @tab ptr @tab +@item @code{targetsync} @tab @code{hipStream_t} @tab ptr @tab +@end multitable + + + @c --------------------------------------------------------------------- @c The libgomp ABI @c ---------------------------------------------------------------------