Add basic support for vector_size(32).

We have had several users ask us to implement ways of using the Power10 load
vector pair and store vector pair instructions to give their code a speed up
due to reduced memory bandwidth.

I had originally posted the following patches:

    *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636077.html
    *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636078.html
    *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636083.html
    *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636080.html
    *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636081.html

to add a set of built-in functions that use the PowePC __vector_pair type and
that provide a set of functions to do basic operations on vector pair.

After I posted these patches, it was decided that it would be better to have a
new type that is used rather than a bunch of new built-in functions.  Within
the GCC context, the best way to add this support is to extend the vector modes
so that V4DFmode, V8SFmode, V4DImode, V8SImode, V16HImode, and V32QImode are
used.

While in theory you could add a whole new type that isn't a larger size vector,
my experience with IEEE 128-bit floating point is that GCC really doesn't like
2 modes that are the same size but have different implementations (such as we
see with IEEE 128-bit floating point and IBM double-double 128-bit floating
point).  So I did not consider adding a new mode for using with vector pairs.

My original intention was to just implement V4DFmode and V8SFmode, since the
primary users asking for vector pair support are people implementing the high
end math libraries like Eigen and Blas.

However in implementing this code, I discovered that we will need integer
vector pair support as well as floating point vector pair.  The integer modes
and types are needed to properly implement byte shuffling and vector
comparisons which need integer vector pairs.

With the current patches, vector pair support is not enabled by default.  The
main reason is I have not implemented the support for byte shuffling which
various tests depend on.

I would also like to implement overloads for the vector built-in functions like
vec_add, vec_sum, etc. that if you give it a vector pair, it would handle it
just like if you give a vector type.

In addition, once the various bugs are addressed, I would then implement the
support so that automatic vectorization would consider using vector pairs
instead of vectors.

This is the first patch in the series.  It implements the basic modes, and
it allows for initialization of the modes.  I've added some optimizations for
extracting and setting fields within the vector pair.

The second patch will implement the floating point vector pair support.

The third patch will implement the integer vector pair support.

The fourth patch will provide new tests to the test suite.

When I test a saxpy type loop (a[i] += (b[i] * c[i])), I generally see a 10%
improvement over either auto-factorization, or just using the vector types.

I have tested these patches on a little endian power10 system.  With
-vector-size-32 disabled by default, there are no regressions in the
test suite.

I have also built and run the tests on both little endian power9 and big
endian power9 systems, and there are no regressions.  Can I check these
patches into the master branch?

2023-11-19  Michael Meissner  <meiss...@linux.ibm.com>

gcc/

        * config/rs6000/constraint.md (eV): New constraint.
        * config/rs6000/predicates.md (cons_0_to_31_operand): New predicate.
        (easy_vector_constant): Add support for vector pair constants.
        (easy_vector_pair_constant): New predicate.
        (mam_assemble_input_operand): Allow other 16-byte vector modes than
        Immodest.
        * config/rs6000/rs6000-c.cc (rs6000_cpu_cpp_builtins): Define
        __VECTOR_SIZE_32__ if -mvector-size-32.
        * config/rs6000/rs6000-protos.h (vector_pair_to_vector_mode): New
        declaration.
        (split_vector_pair_constant): Likewise.
        (rs6000_expand_vector_pair_init): Likewise.
        * config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached): Use
        VECTOR_PAIR_MODE instead of comparing mode to OOmode.
        (rs6000_modes_tieable_p): Allow various vector pair modes to pair with
        each other.  Allow 16-byte vectors to pair with vector pair modes.
        (rs6000_setup_reg_addr_masks): Use VECTOR_PAIR_MODE instead of comparing
        mode to OOmode.
        (rs6000_init_hard_regno_mode_ok): Setup vector pair mode basic type
        information and reload handlers.
        (rs6000_option_override_internal): Warn if -mvector-pair-32 is used
        without -mcpu=power10 or -mmma.
        (vector_pair_to_vector_mode): New function.
        (split_vector_pair_constant): Likewise.
        (rs6000_expand_vector_pair_init): Likewise.
        (reg_offset_addressing_ok_p): Add support for vector pair modes.
        (rs6000_emit_move): Likewise.
        (rs6000_preferred_reload_class): Likewise.
        (altivec_expand_vec_perm_le): Likewise.
        (rs6000_opt_vars): Add -mvector-size-32 switch.
        (rs6000_split_multireg_move): Add support for vector pair modes.
        * config/rs6000/rs6000.h (VECTOR_PAIR_MODE): New macro.
        * config/rs6000/rs6000.md (wd mode attribute): Add vector pair modes.
        (RELOAD mode iterator): Likewise.
        (toplevel): Include vector-pair.md.
        * config/rs6000/rs6000.opt (-mvector-size-32): New option.
        * config/rs6000/vector-pair.md: New file.
        * doc/md.texi (PowerPC constraints): Document the eV constraint.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Reply via email to