On Mittwoch, 27. MΓ€rz 2024 14:34:52 CET Richard Sandiford wrote: > Matthias Kretz <m.kr...@gsi.de> writes: > > The big issue here is that, IIUC, a user (and the simd library) cannot do > > the right thing at the moment. There simply isn't enough context > > information available when parsing the <experimental/simd> header. I.e. > > on definition of the class template there's no facility to take > > target_clones or SME "streaming" mode into account. Consequently, if we > > want the library to be fit for SME, then we need more language > > extension(s) to make it work. > > Yeah. I think the same applies to plain SVE.
With "plain SVE" you mean the *scalable* part of it, right? BTW, I've experimented with implementing simd<T> basically as template <typename T, int N> class simd { alignas(bit_ceil(sizeof(T) * N)) T data[N]; See here: https://compiler-explorer.com/z/WW6KqanTW Maybe the compiler can get better at optimizing this approach. But for now it's not a solution for a *scalable* variant, because every code is going to be load/store bound from the get go. @Srinivas: See the guard variables for __index0123? They need to go. I believe you can and should declare them `constexpr`. > It seems reasonable to > have functions whose implementation is specialised for a specific SVE > length, with that function being selected at runtime where appropriate. > Those functions needn't (in principle) be in separate TUs. The βbestβ > definition of native<float> then becomes a per-function property rather > than a per-TU property. Hmm, I never considered this; but can one actually write fixed-length SVE code if -msve-vector-bits is not given? Then it's certainly possible to write a single TU with a runtime dispatch for all different SVE-widths. (This is less interesting on x86 where we need to dispatch on ISA extensions *and* vector width. It's much simpler (and safer) to compile a TU multiple times, restricted to a certain set of ISA extensions and then dispatch to the right translation at from some general code section.) > As you note later, I think the same thing would apply to x86_64. Yes. I don't think "same" is the case (yet) but it's very similar. Once ARM is at SVE9 π and binaries need to support HW from SVE2 up to SVE9 it gets closer to "same". > > The big issue I see here is that currently all of std::* is declared > > without a arm_streaming or arm_streaming_compatible. Thus, IIUC, you > > can't use anything from the standard library in streaming mode. Since > > that also applies to std::experimental::simd, we're not creating a new > > footgun, only missing out on potential users? > > Kind-of. However, we can inline a non-streaming function into a streaming > function if that doesn't change defined behaviour. And that's important > in practice for C++, since most trivial inline functions will not be > marked streaming-compatible despite being so in practice. Ah good to know that it takes a pragmatic approach here. But I imagine this could become a source of confusion to users. > > [...] > > the compiler *must* virally apply target_clones to all functions it calls. > > And member functions must either also get cloned as functions, or the > > whole type must be cloned (as in the std::simd case, where the sizeof > > needs to change). π³ > Yeah, tricky :) > > It's also not just about vector widths. The target-clones case also has > the problem that you cannot detect at include time which features are > available. E.g. βdo I have SVE2-specific instructions?β becomes a > contextual question rather than a global question. > > Fortunately, this should just be a missed optimisation. But it would be > nice if uses of std::simd in SVE2 clones could take advantage of SVE2-only > instructions, even if SVE2 wasn't enabled at include time. Exactly. Even if we solve the scalable vector-length question, the target_clones question stays relevant. So far my best answer, for x86 at least, is to compile the SIMD code multiple times into different shared libraries. And then let the dynamic linker pick the right library variant depending on the CPU. I'd be happy to have something simpler and working right out of the box. Best, Matthias -- ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Center for Heavy Ion Research https://gsi.de std::simd ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ