On Fri, Oct 30, 2020 at 10:48:09PM +0800, Chung-Lin Tang wrote: > We've been going over how we should implement the requires directive, in a > bit more complete > sense than the current state (i.e. only atomic_default_mem_order working). > > For the three clauses where the specification requires that "must appear in > all compilation > units of a program that contain device constructs or device routines or in > none of them": > - reverse_offload > - unified_address > - unified_shared_memory > > The current design we're contemplating is to generate a mask variable of > these 3 clauses for > compilation units built with -fopenmp, have them tagged with an attribute to > be collected > into a special section (e.g. ".gnu.gomp.requires"). Later at runtime device > startup, have > them checked by the runtime against the capabilities of the libgomp offload > target. > Cross-checking each word (assuming it is a word that we generate for each > compile unit) > against each other can also implement the consistency requirement. > > (actually, as a first stage implementation, we were hoping to just have the > special section > implemented, which allows compilation of OpenMP programs using > requires-directive, and > implement any runtime checking at a later stage) > > We hope to check with you first on any design issues. Have you given any > thought on this > directive?
I vaguely considered each TU with such requires directives would just call in the ctors of the TU some libgomp routine that would tell libgomp the bitmask requirement, and either it would be called before the devices are initialized, in that case it would result in filtering of the devices - devices not satisfying those requirements wouldn't be registered - or if called after the devices are loaded (e.g. due to dlopen etc.), dunno, either terminate the program or just finalize those devices. Now, you're certainly right it is better to track it somewhere in data and just let it be resolved during linking. Anyway, we shouldn't record these 3 requires flags anywhere in TUs that don't contain any device constructs or device routines - the current OMP_REQUIRES_TARGET_USED is set by target construct only, while it applies I think at least to omp target{, data, update, enter data, exit data}, probably some declare variants, and various omp_target_* etc. calls (so we'd need e.g. have some function list and compare direct calls to those functions somewhere (perhaps gimplify)). One possible spot to encode the mask could be somewhere in the offloading LTO sections and let mkoffload collect it, diagnose and let it runtime library know about the requirements (and let the runtime library again check those requirements across the binary and shared libraries). Just note that there can be even TUs that don't really have anything to offload, but still have some device constructs or call device routines and so we'd need to force the offloading sections even for those. As for dynamic_allocators, I'd like to implement that soon, though because the wording is unfortunately bad (except for allocate clause on target construct, the requirement that the allocator is constant expression when not dynamic_allocators is talking about target region and therefore something to be discovered only at runtime rather than compile time), I think all we can do is emit warnings. Jakub