[PATCH 0/4] Add vector pair support to PowerPC attribute((vector_size(32)))

Michael Meissner Sun, 19 Nov 2023 20:18:57 -0800

This is simiilar to the patches on November 10th.

    *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636077.html
    *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636078.html
    *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636083.html
    *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636080.html
    *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636081.html


to add a set of built-in functions that use the PowePC __vector_pair type and
that provide a set of functions to do basic operations on vector pair.

After I posted these patches, it was decided that it would be better to have a
new type that is used rather than a bunch of new built-in functions.  Within
the GCC context, the best way to add this support is to extend the vector modes
so that V4DFmode, V8SFmode, V4DImode, V8SImode, V16HImode, and V32QImode are
used.

These patches are to provide this new implementation.

While in theory you could add a whole new type that isn't a larger size vector,
my experience with IEEE 128-bit floating point is that GCC really doesn't like
2 modes that are the same size but have different implementations (such as we
see with IEEE 128-bit floating point and IBM double-double 128-bit floating
point).  So I did not consider adding a new mode for using with vector pairs.

My original intention was to just implement V4DFmode and V8SFmode, since the
primary users asking for vector pair support are people implementing the high
end math libraries like Eigen and Blas.

However in implementing this code, I discovered that we will need integer
vector pair support as well as floating point vector pair.  The integer modes
and types are needed to properly implement byte shuffling and vector
comparisons which need integer vector pairs.

With the current patches, vector pair support is not enabled by default.  The
main reason is I have not implemented the support for byte shuffling which
various tests depend on.

I would also like to implement overloads for the vector built-in functions like
vec_add, vec_sum, etc. that if you give it a vector pair, it would handle it
just like if you give a vector type.

In addition, once the various bugs are addressed, I would then implement the
support so that automatic vectorization would consider using vector pairs
instead of vectors.

In terms of benchmarks, I wrote two benchmarks:

   1)   One benchmark is a saxpy type loop: value[i] += (a[i] * b[i]).  That is
        a loop with 3 loads and a store per loop.

   2)   Another benchmark produces a scalar sun of an entire vector.  This is a
        loop that just has a single load and no store.

For the saxpy type loop, I get the following general numbers for both float and
double:

   1)   The benchmarks that use attribute((vector_size(32))) are roughly 9-10%
        faster than using normal vector processing (both auto vectorize and
        using vector types).

   2)   The benchmarks that use attribute((vector_size(32))) are roughly 19-20%
        faster than if I write the loop using the vector pair loads using the
        exist built-ins, and then manually split the values and do the
        arithmetic and single vector stores,

Unfortunately, for floating point, doing the sum of the whole vector is slower
using the new vector pair built-in functions using a simple loop (compared to
using the existing built-ins for disassembling vector pairs.  If I write more
complex loops that manually unroll the loop, then the floating point vector
pair built-in functions become like the integer vector pair integer built-in
functions.  So there is some amount of tuning that will need to be done.

There are 4 patches in this set:

The first patch adds support for the types, and does moves, and provides some
optimizations for extracting an element and setting an element.

The second patch implements the floating point arithmetic operations.

The third patch implements the integer operations.

The fourth patch provides new tests to test these features.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

[PATCH 0/4] Add vector pair support to PowerPC attribute((vector_size(32)))

Reply via email to