I have posted this patch several times over the years. I am reposting it in case the last time I posted it got lost.
The last time I posted this patch was on October 28th: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666662.html This patch was posted a year or so during the GCC 14 patches and then reposted on October 28th, and I'm posting it again with the hopes that I can get this into GCC 15. In the GCC 14 time frame, 1,024 bit registers were not supported due to the bit length in internal structures. In GCC 15, 1,024 bit registers are now supported. Note, these patches are for a potential future PowerPC. They are not targeted towards a specific CPU, and they may change if/when a PowerPC with this instruction set is released. The main motivation is to get in support for the 1,024 bit dense math registers into the current GCC. In the current power10 hardware, the 8 512-bit accumulator registers overlap with the VSX registers 0..31. If dense math register support is added in a future machine, these registers will become separate registers. The current instructions will work, using these new registers. If you use existing code, the VSX registers that currently overlap with the accumulators will not be used, and instead the separate dense math registers will be used. One of the important changes in these patches is to add a new constraint ('wD'). When code is compiled for the power10, 'wD' will match the VSX registers 0..31 (i.e. the traditional floating point registers). When code is compiled for the potential future machine, 'wD' will match the new separate dense math registers. Thus for __asm__ code that uses the accumulator registers, the code should change 'd' constraints to 'wD'. The intention is that user code using extended asm can be modified to run on both MMA without dense math and MMA with dense math: 1) If possible, don't use extended asm, but instead use the MMA built-in functions; 2) If you do need to write extended asm, change the d constraints targetting accumulators should now use wD when using GCC 15 or later; 3) Only use the built-in zero, assemble and disassemble functions create move data between vector quad types and dense math accumulators. I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the extended asm code. The reason is these instructions assume there is a 1-to-1 correspondence between 4 adjacent FPR registers and an accumulator that overlaps with those instructions. With accumulators now being separate registers, there no longer is a 1-to-1 correspondence. Note, the first patch of the previous patch set, which enables the memory move optimizations to use load/store vector pair instructions for -mcpu=future has been moved to the -mcpu=future support. This patch assumes the previous patches submitted on November 16th have been applied: Add more user friendly TARGET_names for PowerPC https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669067.html Add support for -mcpu=future in the PowerPC https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669099.html Logically the following patches might not be needed by these patches, but I haven't tried the combination: Do not allow -mvsx to boost the cpu to power7 https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669106.html Separate PowerPC ISA bits from architecture bits set by -mcpu=<xxx> https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669108.html The other bug fixes posted are independent of this patch. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com