On 15/03/2024 13:56, Tobias Burnus wrote:
Hi Andrew,
Andrew Stubbs wrote:
This is more-or-less what I was planning to do myself, but as I want
to include all the other features that get parametrized in gcn.cc,
gcn.h, gcn-hsa.h, gcn-opts.h, I hadn't got around to it yet.
Unfortunately, I think the gcn.opt and config.gcc will always need
manually updating, but if that's all it'll be an improvement.
Well, for .opt see how nvptx does it – it actually generates an .opt file.
I don't like the idea of including AMDGPU_ISA_UNSUPPORTED;
I concur – I was initially thinking of reporting the device name
("Unsupported %s") but I then realized that the agent returns a string
while only for GCC generated files (→ eflag) the hexcode is used. Thus,
I ended up not using it.
Ultimately, I want to replace many of the conditionals like
"TARGET_CDNA2_PLUS" from the code and replace them with feature flags
derived from a def file, or at least a header file. We've acquired too
many places where there are unsearchable conditionals that need
finding and fixing every time a new device comes along.
I was thinking of having more flags, but those where the only ones
required for the two files.
I had imagined that this .def file would exist in gcc/config/gcn, but
you've placed it in libgomp.... maybe it makes sense to have multiple
such files if they contain very different data, but I had imagined one
file and I'm not sure that the compiler definitions live in libgomp.
There is already:
gcc/config/darwin-c.cc:#include "../../libcpp/internal.h"
gcc/config/gcn/gcn-run.cc:#include
"../../../libgomp/config/gcn/libgomp-gcn.h"
gcc/fortran/cpp.cc:#include "../../libcpp/internal.h"
gcc/fortran/trigd_fe.inc:#include "../../libgfortran/intrinsics/trigd.inc"
But there is also the reverse:
libcpp/lex.cc:#include "../gcc/config/i386/cpuid.h"
libgfortran/libgfortran.h:#include "../gcc/fortran/libgfortran.h"
lto-plugin/lto-plugin.c:#include "../gcc/lto/common.h"
If you add more items, it is probably better to have it under
gcc/config/gcn/ - and I really prefer a single file for all.
* * *
Talking about feature sets: This would be a bit like LLVM (see below)
but I think they have a bit too much indirections. But I do concur that
we need to consolidate the current support – and hopefully make it
easier to keep adding more GPU support; we seem to have already covered
a larger chunk :-)
I also did wonder whether we should support, e.g., running a gfx1100
code (or a gfx11-generic one) on, e.g., a gfx1103 device. Alternatively,
we could keep the current check which requires an exact match.
We didn't invent that restriction; the runtime won't let you do it. We
only have the check because the HSA/ROCr error message is not very
user-friendly.
BTW: I do note that looking at the feature sets in LLVM that all GFX110x
GPUs seem to have common silicon bugs: FeatureMSAALoadDstSelBug and
FeatureMADIntraFwdBug, while 1100 and 1102 additionally have the
FeatureUserSGPRInit16Bug but 1101 and 1103 don't. — For some reasons,
FeatureISAVersion11_Generic only consists of two of those bugs (it
doesn't have FeatureMADIntraFwdBug), which doesn't seem to be that
consistent. Maybe the workaround has issues elsewhere? If so, a generic
-march=gfx11 might be not as useful as one might hope for.
* * *
If I look at LLVM's
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPU.td ,
they first define several features – like 'FeatureUnalignedScratchAccess'.
Then they combine them like in:
def FeatureISAVersion11_Common ... [FeatureGFX11, ...
FeatureAtomicFaddRtnInsts ...
And then they use those to map them to feature sets like:
def FeatureISAVersion11_0_Common ...
listconcat(FeatureISAVersion11_Common.Features,
[FeatureMSAALoadDstSelBug ...
And for gfx1103:
def FeatureISAVersion11_0_3 : FeatureSet<
!listconcat(FeatureISAVersion11_0_Common.Features,
[])>;
The mapping to gfx... names then happens in
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/GCNProcessors.td such as:
def : ProcessorModel<"gfx1103", GFX11SpeedModel,
FeatureISAVersion11_0_3.Features
>;
Or for the generic one, i.e.:
// [gfx1100, gfx1101, gfx1102, gfx1103, gfx1150, gfx1151]
def : ProcessorModel<"gfx11-generic", GFX11SpeedModel,
FeatureISAVersion11_Generic.Features
LLVM also has some generic flags like the following in
https://github.com/llvm/llvm-project/blob/main/llvm/lib/TargetParser/TargetParser.cpp
{{"gfx1013"}, {"gfx1013"}, GK_GFX1013,
FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32|FEATURE_XNACK|FEATURE_WGP},
I hope that this will give some inspiration – but I assume that at least
the initial implementation will be much shorter.
Yeah, we can have one macro for each arch, or multiple macros for
building different tables. First one seems easier but less readable,
second one will need some thinking about. Probably best to keep it
simple though.
Andrew