This is version 4 of the patches. The previous patches were: * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707452.html * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707453.html * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707454.html * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707455.html * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707456.html
Compared to the V3 patches, the main change is to separate the introduction of the 'wD' constraint and adding the accumulator_operand predicate with changing the MMA instructions to use this new constraint and predicate until a later patch. The patches in this patch set include: Patch #1 adds the 'wD' constraint and the accumulator_operand, but it does not actually use these features. Patch #2 just adds the -mdense-math option, but there is no code that uses this option in this patch. Patch #3 adds support for the 512-bit dense math registers as a separate register set. However, nothing uses this new register set. Note, it really can't be broken up into smaller patches because the basic consistency checks for registers will fail if we don't have full support for using a register set. Patch #4 switches the MMA instructions to use the new dense math registers if -mdense-math is used, and the traditional support for power10/power11 if -mno-dense-math is used. The -mcpu=future option now sets -mdense-math by default. Patch #5 adds support for dense math registers that are evaluated as 1,024 bits instead of 512. Here is the introduction to dense math registers that I used in the version 3 patches: The Dense Math Facility (dmf) is designed to be an extension to the ISA 3.1 (i.e. power10/power11) MMA facility. Now, since these are future patches, the Dense Math Facility might appear in future PowerPC machines or maybe it won't be used in real hardware. One of the concepts of the DMF system is the accumulators used in the MMA and the DMF extensions will become separate registers, rather than being overlaid over the traditional floating point registers (i.e. VSX registers 0..31). In addition to being separate registers, the dense math accumulators are now logically 1,024 biits instead of 512. The way the Dense Math registers and instructions are designed, existing power10/power11 MMA instructions that operate on 512 bits will work with Dense Math. In ISA 3.1, each of the 8 accumulators are overlaid over 4 adjacent FPR registers, and the compiler must not touch the 4 adjacent FPRs while the MMA accumulator is used. In the Dense Math system, the accumulator is a separate register. When -mcpu=power11 or -mcpu=power10 is used, the GCC compiler will not allocate the appropriate FPR (VSX) reigsters when generating MMA instructions. If a function compiled for Power10/Power11 is run on a system with Dense Math support enabled, the effect is a bunch of the FPR registers will not be allocated because the compiler assumes the accumulaters are there. After these patches are applied, if the user compiles the code with -mcpu=future, the compiler can allocate up to 32 more vector registers, because the Dense Math accumulators are separate registers. In fact two of the MMA tests (mma-double-test.c and mma-single-test.c) do about 20 less spills of floating point values to the stack, since the compiler can allocate those FPR vector registers for other purposes. I have built bootstrap little endian compilers on power10 systems, and there were no regression in the tests. Can I add the patches to the GCC trunk after the -mcpu=future patch is applied? -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: [email protected]
