https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121672

            Bug ID: 121672
           Summary: [OpenMP] 'declare variant' with 'construct={simd}'
                    context selector not handled
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: openmp, rejects-valid
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: burnus at gcc dot gnu.org
                CC: jakub at gcc dot gnu.org, sandra at gcc dot gnu.org
  Target Milestone: ---

This is a follow-up to PR121630 – see also discussion in the thread
 https://gcc.gnu.org/pipermail/gcc-patches/2025-August/693247.html
plus earlier and later in the thread

In a gist:

* OpenMP permits function variant replacement depending on the context

* For SIMD, the idea seems to be:
  - either the user let's the compiler create generic SIMD versions
    using 'declare simd'
  - or the user has a hand-optimized special version, which should get
    used via 'declare variant' and the 'simd' trait of the context selector.

* GCC contains code to handle the latter, but it is incomplete and combining
  it with adjust_args/append_args fails. (→ 'sorry' diagnostic being added
  as part of PR121630).


>From c-c++-common/gomp/declare-variant-7.c
or         gcc.dg/gomp/pr95315-2.c

typedef int __v4si __attribute__((vector_size (16)));
__v4si f1 (__v4sf, __v4sf, float *);

#pragma omp declare variant (f1) \
    match (construct={parallel,for,
                      simd(simdlen(4),notinbranch,uniform(z), \
                           aligned(z:4 * sizeof (*z)))})
int f4 (float x, float y, float *z);


The idea seems to be that this adds 'f1' as variant to base function 'f4'
such that the SIMD version gets automatically used in code like:

  #pragma omp parallel for simd aligned (w:4 * sizeof (float))
  for (int i = 0; i < 1024; i++)
    if (ret_false ())
      x[i] = f4 (y[i], z[i], w);

* * *

Unfortunately, the GCC implementation is incomplete and not actually active.

And also the specification is not as clear as it should be, and actually also
has regressed, see the following OpenMP spec issue, filed yesterday:

"[C++] Issues with non-template 'declare_variant' resolution" (Issue #4583)
https://github.com/OpenMP/spec/issues/4583

* * *

Historic background:

* Quoting from https://github.com/OpenMP/spec/issues/1652 (aka TRAC 720):

"Some OpenMP directives like "omp declare simd" or "omp target" force
 the compiler to generate new variants of existing functions that are
 specialized for a given context (e.g., a particular device or vector ISA).
"In some cases, users have coded themselves those variants (typically because
 they have a hand-optimized version) but currently OpenMP only allows to
 specify a variant that could then be used directly by the compiler in the
 limited context of the implements clause of the declare target construct.
"This ticket provides a new directive, "declare variant", that allows to inform
 the compiler which variants of a given function exist and in which context it
 would be appropriate for the compiler to use them."

Associated pull request:
  https://github.com/OpenMP/spec/pull/508

The discussion seemingly happened mostly at an OpenMP face-to-face meeting
in Austin, according to the meeting notes (see the ones for Oct 2, 2018).
Unfortunately, there seem to be only notes for some of the days – and while
there are a very few PDF attachments, none seem to exist for 'declare variant'.
Not does the GitHub-converted issue / pull request provide much details.

Quotes from yesterday's emails thread (see link above) - but related to the
the old F2F meetings:

"At that meeting based on external feedback from Jason Merrill it was even
 changed so that declare variant is marking the base calls with name of
 variant to be looked up (note, at that point, not at the call site) rather
 than the previous design which marked the variant calls to be variants for
 base."

and

"Users who don't want to mess with the details can just use declare simd
 directive; if they want to use it only in some cases and in other cases call
 something else, they can mix it up with declare variant, declare variant on
 the base with variant having declare simd, just they can't use the simd
 construct in the selector.
 This simd handling was meant e.g. for users who want to write extra
 optimized versions of the vectorized code in assembly or using say x86
 intrinsics."

* * *

EXPECTED / TO DO:

* Actually make the context selection work for 'construct={simd}'

* Doing so, watch out and handle (implement - or ensure that a
  'sorry' is printed or a diagnostic if implemented, but not supported
  by the hardware target/ABI):

  - argument checking to ensure that it is indeed compatible with the
    base function + trait properties.

  - the interaction with 'declare simd' by the user for both the
    base function and the variant function

  - 'adjust_args' and 'append_args' clauses


Once implemented, the documentation should be updated.
Most obvious place, IMHO, would be
https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Context-Selectors.html
Namely, state how to use it and state when it is used and the constraints.


Otherwise, quoting again from the current thread, we need to
handle & watch out for:

"The C case is easier, there is function overloading, so for the simd
 trait case it is about just verification whether the function prototype
 matches the right types of the declare simd variant.  Slightly more
 complicated by the fact that we don't have just one variant as Intel has,
 but several, so we need to figure out which one of those it is and remember
 it for vectorization purposes.  And need to deal even with functions with
 the right VECTOR_TYPEs but without necessary ISA enabled where it actually
 is passed differently, guess we want to error on those if TYPE_MODE of the
 VECTOR_TYPEs is not the expected mode even if it is a vector type with
 the right element type and number of elements.

"The C++ case is harder, we need to do name lookup with the right argument
 types, so we need to iterate over the declare simd variants for the target
 for the given simdlen/{,not}inbranch and look up all of them and those that
 we find and are ok remember, those which aren't find ignore silently unless
 none are found.  Furthermore, a question is what exact element type to
 choose, e.g. for declare simd we just use integral element type for pointer
 types and it is ok to require users to do that, but shall it be unsigned or
 signed element type in that case, and which of say unsigned long or unsigned
 long long if they have the same mode.  E.g. the Intel vector ABI is
 officially defined in terms of __m128{,d,i}/__m256{,d,i}/__m512{,d,i} I think,
 so we could be using the signedness matching those types.

"For Fortran unsure, it has some kind of overloading with interfaces,
 though unsure if we support generic vectors in the FE at all.

"All even more complicated by offloading but guess we already have big
 problems with mixing declare simd and calls from target regions."

* * *

Note that there a couple of conditions/restrictions related to the 'simd'
context selector, in particular for use with 'declare variant':

"Some properties of the simd trait selector have special rules to match the
properties of the simd trait:
• The simdlen(N) property of the trait selector matches the simdlen(M) trait of
  the OpenMP context if M is a multiple of N; and
• The aligned(list:N) property of the trait selector matches the
aligned(list:M)
  trait of the OpenMP context if N is a multiple of M."


For metadirectives:

"Restrictions to the when clause are as follows: […]
• context-selector must not specify any properties for the simd trait
selector."


For delimited begin/end declare variant:

"The restrictions to begin declare_variant directive are as follows:
• match clause must not contain a simd trait selector."


"15.1 OpenMP Contexts":
"1. For procedures with a declare_simd directive, the simd trait is added to
the
 beginning of the construct trait set as c1 for any generated SIMD versions so
the
 total size of the trait set is increased by one."

"The simd trait is a clause-list trait that is defined with properties that
match
 the clauses that can be specified on the declare_simd directive with the same
 names and semantics. The simd trait defines at least the simdlen property and
one
 of the inbranch or notinbranch properties. Traits in the construct trait set
 other than simd are non-property traits."

"Each trait-property of the simd trait selector is a trait-property-clause. The
 syntax is the same as for a valid clause of the declare_simd directive
 and the restrictions on the clauses from that directive apply. The construct
 selector set is an ordered list c1, ..., cN."


And finally, applicable to 'declare simd':

"If ... the simdlen clause is not specified, the number of concurrent arguments
 for the function is implementation defined."

However, 'declare variant' has to deal with an omitted 'simdlen' as well.

Reply via email to