On 31/01/2020 13:56, Kwok Cheung Yeung wrote:
The GCN architecture has 4 SIMD units per compute unit, with 256 VGPRs
per SIMD unit. OpenMP threads or OpenACC workers must be distributed
across the SIMD units, with each thread/worker fitting entirely within a
single SIMD unit. VGPRs are shared by the kernels running in a SIMD
unit, so we can have 4 workers that use up to 256 VGPRs, 8 workers that
use up to 128 VGPRs, 16 workers that use up to 64 VGPRs and so on.
If more threads/workers are requested than can be supported, then the
runtime fails with the message:
libgomp: GCN fatal error: Asynchronous queue error
Runtime message: HSA_STATUS_ERROR_INVALID_ISA: The instruction set
architecture is invalid.
This patch adds code to mkoffload such that the number of VGPRs (and
SGPRs for good measure) requested by a kernel is reported to libgomp at
runtime. When launching a kernel, if libgomp detects that the number of
threads/workers exceeds what can be supported by the hardware, it
automatically scales down the number to the maximum supported value.
This behaviour can be overridden using environment variables to set an
explicit number of threads/workers (GCN_NUM_THREADS/GCN_NUM_WORKERS),
but there is not much point IMO as the kernel will just fail to run.
Tested on a GCN3 accelerator with 6 new passes and no regressions noted
in libgomp. Okay for trunk?
Kwok
gcc/
* config/gcn/mkoffload.c (process_asm): Add sgpr_count and
vgpr_count to
definition of hsa_kernel_description. Parse assembly to find SGPR and
VGPR count of kernel and store in hsa_kernel_description.
libgomp/
* plugin/plugin-gcn.c (struct hsa_kernel_description): Add sgpr_count
and vgpr_count fields.
(struct kernel_info): Add a field for a hsa_kernel_description.
(run_kernel): Reduce the number of threads/workers if the requested
number would require too many VGPRs.
(init_basic_kernel_info): Initialize description field with
the hsa_kernel_description entry for the kernel.
OK.
Andrew