On 30/11/2021 16:54, Jakub Jelinek wrote:
Why does the GCN plugin or runtime need to know those vars?
It needs to know the single array that contains their addresses of course...

With older LLVM there were issues with relocations that made it impossible to link the the offload_var_table. This is why mkoffload deletes it. I've not tried it again recently, so it's possible we could completely rework the way these are processed in the plugin, but that's the hard option.

What it currently does is a symbol lookup for each named variable listed in the C wrapper used to embed the kernel. The lookup provided by the AMD runtime ignores symbols that are not exported, even if they are present in the ELF.

Actually, you've done it in ACCEL_COMPILER only, so
I assume linking the above two sources with -fopenmp into a single
binary or shared library will still work because LTO when reading
the byte-code in will remangle the names of those variables to something
where they are unique in that single *.s (or *.ptx) it emits.
But, if you put one of those TUs into a shared library and the other
into another shared library, I don't see how it can work anymore,
because both those ELF objects which will be in data sections of those
libraries might have clashing names.

The plugin loads each image file as an independent "executable". If there are multiple images then there *will be* duplicate symbols (e.g. "init_array") but this is not a problem because they're in a different context.

If there's a problem with duplicate symbols *within* a given image then we have a bigger problem because offload_var_table is referring to them by name. As you say, I presume the LTO stream-in is fixing up such conflicts.

If GCN can't support static variables (but isn't it ELF?) and there is no
other way than sacrifice offloading from multiple shared libraries or binary
in the same process, it at least shouldn't be done for targets which don't
need it (e.g. PTX) and shouldn't be done in the pass you've done it in
(because that means it will walk all the vars for each function it
processes, rather than just once).  So, better place would be e.g.
offload_handle_link_vars in lto/*.c or so.

GCN is ELF and can support static variables just fine ... as long as you don't want to poke values into them from the outside. We do not support any sort of dynamic libraries; kernels are statically linked relocatable executables (the AMD runtime expects the binary to be a dynamic object itself, but there's nothing hunting for dependencies, or anything like that).

I've tried modifying offload_handle_link_vars but that spot doesn't catch the omp_data_sizes variables emitted by libgomp.c-c++-common/target_42.c, which was one of the motivating examples.

It is true that my current placement visits all the symbols for every function, meaning that they are adjusted in an earlier iteration of a pass than you might expect. I couldn't find a single place that fixed this problem only in the amdgcn compiler and wasn't too late.

Do you have a suggestion how to not do this for other GPU targets? We can add another hook or macro, of course ....

Thanks for the review!

Andrew

Reply via email to