On Fri, Feb 20, 2026 at 11:30 AM Matthias Kretz <[email protected]>
wrote:
> This goes on top of the main [simd] patch and is in preparation of an
> implementation for [simd.permute.dynamic].
>
> How could/should I test this change? I have a test that compiles
>
I think we are good with tests that only check if the result is correct,
i.e. elements have correct values.
>
> auto f10(simd::vec<short, 2> a, simd::vec<short, 2> b,
> simd::vec<short, 2> c, simd::vec<short, 2> d,
> simd::vec<short, 8> e) {
> g(simd::cat(a, b, c, d, e));
> }
>
> to
> vmovdqa xmm3, xmm0
> vmovd xmm1, esi
> vmovd xmm0, edi
> vinsertps xmm0, xmm0, xmm1, 16
> vmovd xmm2, ecx
> vmovd xmm1, edx
> vinsertps xmm1, xmm1, xmm2, 16
> vmovq xmm0, xmm0
> vmovq xmm1, xmm1
> vpunpcklqdq xmm0, xmm0, xmm1
> vmovdqa xmm1, xmm3
> vmovdqa xmm0, xmm0
> vperm2i128 ymm0, ymm0, ymm1, 32
>
> The 2x insertps -> unpck -> 128-bit concat sequence shows it's doing the
> expected sequence. But I really hope this will turn into
> vmovd xmm2, edi
> vmovd xmm1, edx
> vpinsrd xmm2, xmm2, esi, 1
> vpinsrd xmm1, xmm1, ecx, 1
> vpunpcklqdq xmm1, xmm2, xmm1
> vinserti128 ymm0, ymm1, xmm0, 1
>
> at some point. I don't think we want such detailed code-gen tests here. A
> test
> like this belongs into the gcc/testsuite.
>
> ---
>
> cat(a, b, c, d) where each argument is e.g. 2 elements wide, would fold
> from the left before this change: (2, 2, 2, 2) -> (4, 2, 2) -> (6, 2) ->
> (8). It is better for ILP (and to avoid load and store instructions) to
> go via (4, 2, 2) -> (4, 4) -> (8).
>
> In theory, for even larger number of arguments, the current
> implementation still isn't good enough. But larger number of arguments
> is something users shouldn't be doing anyway.
>
> Signed-off-by: Matthias Kretz <[email protected]>
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/vec_ops.h (__vec_concat_sized): Add an overload
> that concatenates the second and third operand, if they are
> smaller than the first.
> ---
> libstdc++-v3/include/bits/vec_ops.h | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/libstdc++-v3/include/bits/vec_ops.h
> b/libstdc++-v3/include/bits/
> vec_ops.h
> index 0e89c89b7af5..e5bf2f1497cd 100644
> --- a/libstdc++-v3/include/bits/vec_ops.h
> +++ b/libstdc++-v3/include/bits/vec_ops.h
> @@ -187,7 +187,30 @@ __vec_concat(_TV __a, _TV __b)
> * with the elements from applying this function recursively to @p
> __rest.
> *
> * @pre _N0 <= __width_of<_TV0> && _N1 <= __width_of<_TV1> && _Ns <=
> __width_of<_TVs> && ...
> + *
> + * Strategy: Aim for a power-of-2 tree concat. E.g.
> + * - cat(2, 2, 2, 2) -> cat(4, 2, 2) -> cat(4, 4)
> + * - cat(2, 2, 2, 2, 8) -> cat(4, 2, 2, 8) -> cat(4, 4, 8) -> cat(8, 8)
> */
> + template <int _N0, int _N1, int... _Ns, __vec_builtin _TV0,
> __vec_builtin
> _TV1,
> + __vec_builtin... _TVs>
> + [[__gnu__::__always_inline__]]
> + constexpr __vec_builtin_type<__vec_value_type<_TV0>,
> + __bit_ceil(unsigned(_N0 + (_N1 + ... +
> _Ns)))>
> + __vec_concat_sized(const _TV0& __a, const _TV1& __b, const _TVs&...
> __rest);
> +
> + template <int _N0, int _N1, int _N2, int... _Ns, __vec_builtin _TV0,
> __vec_builtin _TV1,
> + __vec_builtin _TV2, __vec_builtin... _TVs>
> + requires (__has_single_bit(unsigned(_N0))) && (_N0 >= (_N1 + _N2))
> + [[__gnu__::__always_inline__]]
> + constexpr __vec_builtin_type<__vec_value_type<_TV0>,
> + __bit_ceil(unsigned(_N0 + _N1 + (_N2 +
> ... +
> _Ns)))>
> + __vec_concat_sized(const _TV0& __a, const _TV1& __b, const _TV2& __c,
> const _TVs&... __rest)
>
I do not think that this should be a separate overload and another if
constexpr branch
In the default implementation.
+ {
> + return __vec_concat_sized<_N0, _N1 + _N2, _Ns...>(
> + __a, __vec_concat_sized<_N1, _N2>(__b, __c), __rest...);
> + }
> +
> template <int _N0, int _N1, int... _Ns, __vec_builtin _TV0,
> __vec_builtin
> _TV1,
> __vec_builtin... _TVs>
> [[__gnu__::__always_inline__]]
> --
> ──────────────────────────────────────────────────────────────────────────
> Dr. Matthias Kretz https://mattkretz.github.io
> GSI Helmholtz Center for Heavy Ion Research https://gsi.de
> std::simd
> ──────────────────────────────────────────────────────────────────────────
>
>