On Mon, Nov 20, 2023 at 5:19 AM Michael Meissner <meiss...@linux.ibm.com> wrote:
>
> This is simiilar to the patches on November 10th.
>
>     *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636077.html
>     *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636078.html
>     *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636083.html
>     *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636080.html
>     *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636081.html
>
> to add a set of built-in functions that use the PowePC __vector_pair type and
> that provide a set of functions to do basic operations on vector pair.
>
> After I posted these patches, it was decided that it would be better to have a
> new type that is used rather than a bunch of new built-in functions.  Within
> the GCC context, the best way to add this support is to extend the vector 
> modes
> so that V4DFmode, V8SFmode, V4DImode, V8SImode, V16HImode, and V32QImode are
> used.
>
> These patches are to provide this new implementation.
>
> While in theory you could add a whole new type that isn't a larger size 
> vector,
> my experience with IEEE 128-bit floating point is that GCC really doesn't like
> 2 modes that are the same size but have different implementations (such as we
> see with IEEE 128-bit floating point and IBM double-double 128-bit floating
> point).  So I did not consider adding a new mode for using with vector pairs.
>
> My original intention was to just implement V4DFmode and V8SFmode, since the
> primary users asking for vector pair support are people implementing the high
> end math libraries like Eigen and Blas.
>
> However in implementing this code, I discovered that we will need integer
> vector pair support as well as floating point vector pair.  The integer modes
> and types are needed to properly implement byte shuffling and vector
> comparisons which need integer vector pairs.
>
> With the current patches, vector pair support is not enabled by default.  The
> main reason is I have not implemented the support for byte shuffling which
> various tests depend on.
>
> I would also like to implement overloads for the vector built-in functions 
> like
> vec_add, vec_sum, etc. that if you give it a vector pair, it would handle it
> just like if you give a vector type.
>
> In addition, once the various bugs are addressed, I would then implement the
> support so that automatic vectorization would consider using vector pairs
> instead of vectors.
>
> In terms of benchmarks, I wrote two benchmarks:
>
>    1)   One benchmark is a saxpy type loop: value[i] += (a[i] * b[i]).  That 
> is
>         a loop with 3 loads and a store per loop.
>
>    2)   Another benchmark produces a scalar sun of an entire vector.  This is 
> a
>         loop that just has a single load and no store.
>
> For the saxpy type loop, I get the following general numbers for both float 
> and
> double:
>
>    1)   The benchmarks that use attribute((vector_size(32))) are roughly 9-10%
>         faster than using normal vector processing (both auto vectorize and
>         using vector types).
>
>    2)   The benchmarks that use attribute((vector_size(32))) are roughly 
> 19-20%
>         faster than if I write the loop using the vector pair loads using the
>         exist built-ins, and then manually split the values and do the
>         arithmetic and single vector stores,
>
> Unfortunately, for floating point, doing the sum of the whole vector is slower
> using the new vector pair built-in functions using a simple loop (compared to
> using the existing built-ins for disassembling vector pairs.  If I write more
> complex loops that manually unroll the loop, then the floating point vector
> pair built-in functions become like the integer vector pair integer built-in
> functions.  So there is some amount of tuning that will need to be done.
>
> There are 4 patches in this set:
>
> The first patch adds support for the types, and does moves, and provides some
> optimizations for extracting an element and setting an element.
>
> The second patch implements the floating point arithmetic operations.
>
> The third patch implements the integer operations.
>
> The fourth patch provides new tests to test these features.

I wouldn't expose the "fake" larger modes to the vectorizer but rather
adjust m_suggested_unroll_factor (which you already do to some extent).

> --
> Michael Meissner, IBM
> PO Box 98, Ayer, Massachusetts, USA, 01432
> email: meiss...@linux.ibm.com

Reply via email to