Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

Matthias Kretz Wed, 17 Jan 2024 23:28:50 -0800

On Sunday, 10 December 2023 14:29:45 CET Richard Sandiford wrote:
> Thanks for the patch and sorry for the slow review.


Sorry for my slow reaction. I needed a long vacation. For now I'll focus on 
the design question wrt. multi-arch compilation.

> I can only comment on the usage of SVE, rather than on the scaffolding
> around it.  Hopefully Jonathan or others can comment on the rest.

That's very useful!

> The main thing that worries me is:
> 
> #if _GLIBCXX_SIMD_HAVE_SVE
> constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS/8;
> #else
> constexpr inline int __sve_vectorized_size_bytes = 0;
> #endif
> 
> Although -msve-vector-bits is currently a per-TU setting, that isn't
> necessarily going to be the case in future.

This is a topic that I care about a lot... as simd user, implementer, and WG21 
proposal author. Are you thinking of a plan to implement the target_clones 
function attribute for different SVE lengths? Or does it go further than that? 
PR83875 is raising the same issue and solution ideas for x86. If I understand 
your concern correctly, then the issue you're raising exists in the same form 
for x86.

If anyone is interested in working on a "translation phase 7 replacement" for 
compiler flags macros I'd be happy to give some input of what I believe is 
necessary to make target_clones work with std(x)::simd. This seems to be about 
constant expressions that return compiler-internal state - probably similar to 
how static reflection needs to work.

For a sketch of a direction: what I'm already doing in 
std::experimental::simd, is to tag all non-always_inline function names with a 
bitflag, representing a relevant subset of -f and -m flags. That way, I'm 
guarding against surprises on linking TUs compiled with different flags.

> Ideally it would be
> possible to define different implementations of a function with
> different (fixed) vector lengths within the same TU.  The value at
> the point that the header file is included is then somewhat arbitrary.
> 
> So rather than have:
> >  using __neon128 = _Neon<16>;
> >  using __neon64 = _Neon<8>;
> > 
> > +using __sve = _Sve<>;
> 
> would it be possible instead to have:
> 
>   using __sve128 = _Sve<128>;
>   using __sve256 = _Sve<256>;
>   ...etc...
> 
> ?  Code specialised for 128-bit vectors could then use __sve128 and
> code specialised for 256-bit vectors could use __sve256.

Hmm, as things stand we'd need two numbers, IIUC:
_Sve<NumberOfUsedBytes, SizeofRegister>

On x86, "NumberOfUsedBytes" is sufficient, because 33-64 implies zmm registers 
(and -mavx512f), 17-32 implies ymm, and <=16 implies xmm (except where it 
doesn't ;) ).

> Perhaps that's not possible as things stand, but it'd be interesting
> to know the exact failure mode if so.  Either way, it would be good to
> remove direct uses of __ARM_FEATURE_SVE_BITS from simd_sve.h if possible,
> and instead rely on information derived from template parameters.

The TS spec requires std::experimental::native_simd<int> to basically give you 
the largest, most efficient, full SIMD register. (And it can't be a sizeless 
type because they don't exist in C++). So how would you do that without 
looking at __ARM_FEATURE_SVE_BITS in the simd implementation?


> It should be possible to use SVE to optimise some of the __neon*
> implementations, which has the advantage of working even for VLA SVE.
> That's obviously a separate patch, though.  Just saying for the record.

I learned that NVidia Grace CPUs alias NEON and SVE registers. But I must 
assume that other SVE implementations (especially those with 
__ARM_FEATURE_SVE_BITS > 128) don't do that and might incur a significant 
latency when going from a NEON register to an SVE register and back (which 
each requires a store-load, IIUC). So are you thinking of implementing 
everything via SVE? That would break ABI, no?

- Matthias

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Center for Heavy Ion Research               https://gsi.de
 std::simd
──────────────────────────────────────────────────────────────────────────

Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

Reply via email to