https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121672
Bug ID: 121672 Summary: [OpenMP] 'declare variant' with 'construct={simd}' context selector not handled Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: openmp, rejects-valid Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: burnus at gcc dot gnu.org CC: jakub at gcc dot gnu.org, sandra at gcc dot gnu.org Target Milestone: --- This is a follow-up to PR121630 – see also discussion in the thread https://gcc.gnu.org/pipermail/gcc-patches/2025-August/693247.html plus earlier and later in the thread In a gist: * OpenMP permits function variant replacement depending on the context * For SIMD, the idea seems to be: - either the user let's the compiler create generic SIMD versions using 'declare simd' - or the user has a hand-optimized special version, which should get used via 'declare variant' and the 'simd' trait of the context selector. * GCC contains code to handle the latter, but it is incomplete and combining it with adjust_args/append_args fails. (→ 'sorry' diagnostic being added as part of PR121630). >From c-c++-common/gomp/declare-variant-7.c or gcc.dg/gomp/pr95315-2.c typedef int __v4si __attribute__((vector_size (16))); __v4si f1 (__v4sf, __v4sf, float *); #pragma omp declare variant (f1) \ match (construct={parallel,for, simd(simdlen(4),notinbranch,uniform(z), \ aligned(z:4 * sizeof (*z)))}) int f4 (float x, float y, float *z); The idea seems to be that this adds 'f1' as variant to base function 'f4' such that the SIMD version gets automatically used in code like: #pragma omp parallel for simd aligned (w:4 * sizeof (float)) for (int i = 0; i < 1024; i++) if (ret_false ()) x[i] = f4 (y[i], z[i], w); * * * Unfortunately, the GCC implementation is incomplete and not actually active. And also the specification is not as clear as it should be, and actually also has regressed, see the following OpenMP spec issue, filed yesterday: "[C++] Issues with non-template 'declare_variant' resolution" (Issue #4583) https://github.com/OpenMP/spec/issues/4583 * * * Historic background: * Quoting from https://github.com/OpenMP/spec/issues/1652 (aka TRAC 720): "Some OpenMP directives like "omp declare simd" or "omp target" force the compiler to generate new variants of existing functions that are specialized for a given context (e.g., a particular device or vector ISA). "In some cases, users have coded themselves those variants (typically because they have a hand-optimized version) but currently OpenMP only allows to specify a variant that could then be used directly by the compiler in the limited context of the implements clause of the declare target construct. "This ticket provides a new directive, "declare variant", that allows to inform the compiler which variants of a given function exist and in which context it would be appropriate for the compiler to use them." Associated pull request: https://github.com/OpenMP/spec/pull/508 The discussion seemingly happened mostly at an OpenMP face-to-face meeting in Austin, according to the meeting notes (see the ones for Oct 2, 2018). Unfortunately, there seem to be only notes for some of the days – and while there are a very few PDF attachments, none seem to exist for 'declare variant'. Not does the GitHub-converted issue / pull request provide much details. Quotes from yesterday's emails thread (see link above) - but related to the the old F2F meetings: "At that meeting based on external feedback from Jason Merrill it was even changed so that declare variant is marking the base calls with name of variant to be looked up (note, at that point, not at the call site) rather than the previous design which marked the variant calls to be variants for base." and "Users who don't want to mess with the details can just use declare simd directive; if they want to use it only in some cases and in other cases call something else, they can mix it up with declare variant, declare variant on the base with variant having declare simd, just they can't use the simd construct in the selector. This simd handling was meant e.g. for users who want to write extra optimized versions of the vectorized code in assembly or using say x86 intrinsics." * * * EXPECTED / TO DO: * Actually make the context selection work for 'construct={simd}' * Doing so, watch out and handle (implement - or ensure that a 'sorry' is printed or a diagnostic if implemented, but not supported by the hardware target/ABI): - argument checking to ensure that it is indeed compatible with the base function + trait properties. - the interaction with 'declare simd' by the user for both the base function and the variant function - 'adjust_args' and 'append_args' clauses Once implemented, the documentation should be updated. Most obvious place, IMHO, would be https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Context-Selectors.html Namely, state how to use it and state when it is used and the constraints. Otherwise, quoting again from the current thread, we need to handle & watch out for: "The C case is easier, there is function overloading, so for the simd trait case it is about just verification whether the function prototype matches the right types of the declare simd variant. Slightly more complicated by the fact that we don't have just one variant as Intel has, but several, so we need to figure out which one of those it is and remember it for vectorization purposes. And need to deal even with functions with the right VECTOR_TYPEs but without necessary ISA enabled where it actually is passed differently, guess we want to error on those if TYPE_MODE of the VECTOR_TYPEs is not the expected mode even if it is a vector type with the right element type and number of elements. "The C++ case is harder, we need to do name lookup with the right argument types, so we need to iterate over the declare simd variants for the target for the given simdlen/{,not}inbranch and look up all of them and those that we find and are ok remember, those which aren't find ignore silently unless none are found. Furthermore, a question is what exact element type to choose, e.g. for declare simd we just use integral element type for pointer types and it is ok to require users to do that, but shall it be unsigned or signed element type in that case, and which of say unsigned long or unsigned long long if they have the same mode. E.g. the Intel vector ABI is officially defined in terms of __m128{,d,i}/__m256{,d,i}/__m512{,d,i} I think, so we could be using the signedness matching those types. "For Fortran unsure, it has some kind of overloading with interfaces, though unsure if we support generic vectors in the FE at all. "All even more complicated by offloading but guess we already have big problems with mixing declare simd and calls from target regions." * * * Note that there a couple of conditions/restrictions related to the 'simd' context selector, in particular for use with 'declare variant': "Some properties of the simd trait selector have special rules to match the properties of the simd trait: • The simdlen(N) property of the trait selector matches the simdlen(M) trait of the OpenMP context if M is a multiple of N; and • The aligned(list:N) property of the trait selector matches the aligned(list:M) trait of the OpenMP context if N is a multiple of M." For metadirectives: "Restrictions to the when clause are as follows: […] • context-selector must not specify any properties for the simd trait selector." For delimited begin/end declare variant: "The restrictions to begin declare_variant directive are as follows: • match clause must not contain a simd trait selector." "15.1 OpenMP Contexts": "1. For procedures with a declare_simd directive, the simd trait is added to the beginning of the construct trait set as c1 for any generated SIMD versions so the total size of the trait set is increased by one." "The simd trait is a clause-list trait that is defined with properties that match the clauses that can be specified on the declare_simd directive with the same names and semantics. The simd trait defines at least the simdlen property and one of the inbranch or notinbranch properties. Traits in the construct trait set other than simd are non-property traits." "Each trait-property of the simd trait selector is a trait-property-clause. The syntax is the same as for a valid clause of the declare_simd directive and the restrictions on the clauses from that directive apply. The construct selector set is an ordered list c1, ..., cN." And finally, applicable to 'declare simd': "If ... the simdlen clause is not specified, the number of concurrent arguments for the function is implementation defined." However, 'declare variant' has to deal with an omitted 'simdlen' as well.