This is simiilar to the patches on November 10th. * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636077.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636078.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636083.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636080.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636081.html
to add a set of built-in functions that use the PowePC __vector_pair type and that provide a set of functions to do basic operations on vector pair. After I posted these patches, it was decided that it would be better to have a new type that is used rather than a bunch of new built-in functions. Within the GCC context, the best way to add this support is to extend the vector modes so that V4DFmode, V8SFmode, V4DImode, V8SImode, V16HImode, and V32QImode are used. These patches are to provide this new implementation. While in theory you could add a whole new type that isn't a larger size vector, my experience with IEEE 128-bit floating point is that GCC really doesn't like 2 modes that are the same size but have different implementations (such as we see with IEEE 128-bit floating point and IBM double-double 128-bit floating point). So I did not consider adding a new mode for using with vector pairs. My original intention was to just implement V4DFmode and V8SFmode, since the primary users asking for vector pair support are people implementing the high end math libraries like Eigen and Blas. However in implementing this code, I discovered that we will need integer vector pair support as well as floating point vector pair. The integer modes and types are needed to properly implement byte shuffling and vector comparisons which need integer vector pairs. With the current patches, vector pair support is not enabled by default. The main reason is I have not implemented the support for byte shuffling which various tests depend on. I would also like to implement overloads for the vector built-in functions like vec_add, vec_sum, etc. that if you give it a vector pair, it would handle it just like if you give a vector type. In addition, once the various bugs are addressed, I would then implement the support so that automatic vectorization would consider using vector pairs instead of vectors. In terms of benchmarks, I wrote two benchmarks: 1) One benchmark is a saxpy type loop: value[i] += (a[i] * b[i]). That is a loop with 3 loads and a store per loop. 2) Another benchmark produces a scalar sun of an entire vector. This is a loop that just has a single load and no store. For the saxpy type loop, I get the following general numbers for both float and double: 1) The benchmarks that use attribute((vector_size(32))) are roughly 9-10% faster than using normal vector processing (both auto vectorize and using vector types). 2) The benchmarks that use attribute((vector_size(32))) are roughly 19-20% faster than if I write the loop using the vector pair loads using the exist built-ins, and then manually split the values and do the arithmetic and single vector stores, Unfortunately, for floating point, doing the sum of the whole vector is slower using the new vector pair built-in functions using a simple loop (compared to using the existing built-ins for disassembling vector pairs. If I write more complex loops that manually unroll the loop, then the floating point vector pair built-in functions become like the integer vector pair integer built-in functions. So there is some amount of tuning that will need to be done. There are 4 patches in this set: The first patch adds support for the types, and does moves, and provides some optimizations for extracting an element and setting an element. The second patch implements the floating point arithmetic operations. The third patch implements the integer operations. The fourth patch provides new tests to test these features. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com