On Mon, Nov 20, 2023 at 5:19 AM Michael Meissner <meiss...@linux.ibm.com> wrote: > > This is simiilar to the patches on November 10th. > > * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636077.html > * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636078.html > * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636083.html > * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636080.html > * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636081.html > > to add a set of built-in functions that use the PowePC __vector_pair type and > that provide a set of functions to do basic operations on vector pair. > > After I posted these patches, it was decided that it would be better to have a > new type that is used rather than a bunch of new built-in functions. Within > the GCC context, the best way to add this support is to extend the vector > modes > so that V4DFmode, V8SFmode, V4DImode, V8SImode, V16HImode, and V32QImode are > used. > > These patches are to provide this new implementation. > > While in theory you could add a whole new type that isn't a larger size > vector, > my experience with IEEE 128-bit floating point is that GCC really doesn't like > 2 modes that are the same size but have different implementations (such as we > see with IEEE 128-bit floating point and IBM double-double 128-bit floating > point). So I did not consider adding a new mode for using with vector pairs. > > My original intention was to just implement V4DFmode and V8SFmode, since the > primary users asking for vector pair support are people implementing the high > end math libraries like Eigen and Blas. > > However in implementing this code, I discovered that we will need integer > vector pair support as well as floating point vector pair. The integer modes > and types are needed to properly implement byte shuffling and vector > comparisons which need integer vector pairs. > > With the current patches, vector pair support is not enabled by default. The > main reason is I have not implemented the support for byte shuffling which > various tests depend on. > > I would also like to implement overloads for the vector built-in functions > like > vec_add, vec_sum, etc. that if you give it a vector pair, it would handle it > just like if you give a vector type. > > In addition, once the various bugs are addressed, I would then implement the > support so that automatic vectorization would consider using vector pairs > instead of vectors. > > In terms of benchmarks, I wrote two benchmarks: > > 1) One benchmark is a saxpy type loop: value[i] += (a[i] * b[i]). That > is > a loop with 3 loads and a store per loop. > > 2) Another benchmark produces a scalar sun of an entire vector. This is > a > loop that just has a single load and no store. > > For the saxpy type loop, I get the following general numbers for both float > and > double: > > 1) The benchmarks that use attribute((vector_size(32))) are roughly 9-10% > faster than using normal vector processing (both auto vectorize and > using vector types). > > 2) The benchmarks that use attribute((vector_size(32))) are roughly > 19-20% > faster than if I write the loop using the vector pair loads using the > exist built-ins, and then manually split the values and do the > arithmetic and single vector stores, > > Unfortunately, for floating point, doing the sum of the whole vector is slower > using the new vector pair built-in functions using a simple loop (compared to > using the existing built-ins for disassembling vector pairs. If I write more > complex loops that manually unroll the loop, then the floating point vector > pair built-in functions become like the integer vector pair integer built-in > functions. So there is some amount of tuning that will need to be done. > > There are 4 patches in this set: > > The first patch adds support for the types, and does moves, and provides some > optimizations for extracting an element and setting an element. > > The second patch implements the floating point arithmetic operations. > > The third patch implements the integer operations. > > The fourth patch provides new tests to test these features.
I wouldn't expose the "fake" larger modes to the vectorizer but rather adjust m_suggested_unroll_factor (which you already do to some extent). > -- > Michael Meissner, IBM > PO Box 98, Ayer, Massachusetts, USA, 01432 > email: meiss...@linux.ibm.com