https://gcc.gnu.org/g:38501e5227561f090135f68153702b6e2d6a5514
commit 38501e5227561f090135f68153702b6e2d6a5514 Author: Michael Meissner <meiss...@linux.ibm.com> Date: Wed Jul 30 01:19:34 2025 -0400 Update ChangeLog.* Diff: --- gcc/ChangeLog.bugs | 329 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 329 insertions(+) diff --git a/gcc/ChangeLog.bugs b/gcc/ChangeLog.bugs index f79402345eee..5ca8e803434c 100644 --- a/gcc/ChangeLog.bugs +++ b/gcc/ChangeLog.bugs @@ -1,3 +1,332 @@ +==================== Branch work217-bugs, patch #104 ==================== + +PR target/120681 - allow -mcmodel=large with PC relative addressing + +When I implemented the pc-relative support for power10 in GCC, I +disabled using pc-relative support for -mcmodel=large. At the time, I +didn't want to dig into the issues. It is now time to allow +-mcmodel=large to generate pc-relative code. + +This patch allows -mcmodel=large to use prefixed addressing on power10, +power11, and possibly other future PowerPC processors in addition to +the current -mcmodel=medium support. + +2025-07-30 Michael Meissner <meiss...@linux.ibm.com> + +gcc/ + + PR target/120681 + * config/rs6000/linux64.h (PCREL_SUPPORTED_BY_OS): Allow large + code model as well as medium code model. + * config/rs6000/rs6000.cc (rs6000_option_override_internal): + Likewise. + (rs6000_elf_declare_function_name): Don't create the + local/non-local labels for large code model if we are using + PC-relative addressing. + +gcc/testsuite/ + + PR target/120681 + * gcc.target/powerpc/pr120681.c: New test. + +==================== Branch work217-bugs, patch #103 ==================== + +PR target/108958 -- simplify mtvsrdd to zero extend GPR DImode to VSX TImode + +Before this patch GCC would zero extend a DImode GPR value to TImode by first +zero extending the DImode value into a GPR TImode register pair, and then do a +MTVSRDD to move this value to a VSX register. + +For example, consider the following code: + + #ifndef TYPE + #define TYPE unsigned long long + #endif + + void + gpr_to_vsx (TYPE x, __uint128_t *p) + { + __uint128_t y = x; + __asm__ (" # %x0" : "+wa" (y)); + *p = y; + } + +Currently GCC generates: + + gpr_to_vsx: + mr 10,3 + li 11,0 + mtvsrdd 0,11,10 + #APP + # 0 + #NO_APP + stxv 0,0(4) + blr + +I.e. the mr and li instructions create the zero extended TImode value +in a GPR, and then the mtvsrdd instruction moves both registers into a +single vector register. + +Instead, GCC should generate the following code. Since the mtvsrdd +instruction will clear the upper 64 bits if the 2nd argument is 0 +(non-zero values are a GPR to put in the upper 64 bits): + + gpr_to_vsx: + mtvsrdd 0,0,3 + #APP + # 0 + #NO_APP + stxv 0,0(4) + blr + +Originally, I posted a patch that added the zero_extendsiti2 insn. I +got some pushback about using reload_completed in the split portion of +the define_insn_and_split. However, this is a case where you +absolutely have to use the reload_completed test, because if you split +the code before register allocation to handle the normal, the split +insns will not be compiled to generate the appropriate mtvsrdd without +creating the TImode value in the GPR register. I can imagine there +might be concern about favoring generating code using the vector +registers instead of using the GPR registers if the code does not +require the TImode value to be in a vector register. + +I completely rewrote the patch. This patch creates a peephole2 to +catch this case, and it eliminates creating the TImode variable. +Instead it just does the MTVSRDD instruction directly. That way it +will not influence register allocation, and the code will only be +generated in the specific case where we need the TImode value in a +vector register. + +I have built GCC with the patches in this patch set applied on both +little and big endian PowerPC systems and there were no regressions. +Can I apply this patch to GCC 16? + +2025-07-30 Michael Meissner <meiss...@linux.ibm.com> + +gcc/ + + PR target/108958 + * config/rs6000/rs6000.md (UNSPEC_ZERO_EXTEND): New unspec. + (zero_extendsiti2 peephole2): Add a peephole2 to simplify zero + extend between DImode value in a GPR to a TImode target in a + vector register. + (zero_extendsiti2_vsx): New insn. + +gcc/testsuite/ + + PR target/108958 + * gcc.target/powerpc/pr108958.c: New test. + +==================== Branch work217-bugs, patch #102 ==================== + +PR target/120528 -- Simplify zero extend from memory to VSX register on power10 + +Previously GCC would zero extend a DImode value in memory to a TImode +target in a vector register by firt zero extending the DImode value +into a GPR TImode register pair, and then do a MTVSRDD to move this +value to a VSX register. + +For example, consider the following code: + + #ifndef TYPE + #define TYPE unsigned long long + #endif + + void + mem_to_vsx (TYPE *p, __uint128_t *q) + { + /* lxvrdx 0,0,3 + stxv 0,0(4) */ + + __uint128_t x = *p; + __asm__ (" # %x0" : "+wa" (x)); + *q = x; +} + +It currently generates the following code on power10: + + mem_to_vsx: + ld 10,0(3) + li 11,0 + mtvsrdd 0,11,10 + #APP + # 0 + #NO_APP + stxv 0,0(4) + blr + +Instead it could generate: + + mem_to_vsx: + lxvrdx 0,0,3 + #APP + # 0 + #NO_APP + stxv 0,0(4) + blr + +The lxvr{b,h,w,d}x instructions were added in power10, and they load up +a vector register with a byte, half-word, word, or double-word value in +the right most bits, and fill the remaining bits to 0. I noticed this +code when working on PR target/108958 (which I just posted the patch). + +This patch creates a peephole2 to catch this case, and it eliminates +creating the TImode variable. Instead it just does the LXVR{B,H,W,D}x +instruction directly. + +I have built GCC with the patches in this patch set applied on both +little and big endian PowerPC systems and there were no regressions. +Can I apply this patch to GCC 16? + +2025-07-30 Michael Meissner <meiss...@linux.ibm.com> + +gcc/ + + PR target/120528 + * config/rs6000/rs6000.md (zero_extend??ti2 peephole2): Add a + peephole2 to simplify zero extending a QI/HI/SI/DImode value in + memory to a TImode target in a vector register to use the + LXVR{B,H,W,D}X instructins. + +gcc/testsuite/ + + PR target/120528 + * gcc.target/powerpc/pr120528.c: New test. + +==================== Branch work217-bugs, patch #101 ==================== + +PR 992493: Optimize splat of a V2DF/V2DI extract with constant element + +We had optimizations for splat of a vector extract for the other vector +types, but we missed having one for V2DI and V2DF. This patch adds a +combiner insn to do this optimization. + +In looking at the source, we had similar optimizations for V4SI and V4SF +extract and splats, but we missed doing V2DI/V2DF. + +Without the patch for the code: + + vector long long splat_dup_l_0 (vector long long v) + { + return __builtin_vec_splats (__builtin_vec_extract (v, 0)); + } + +the compiler generates (on a little endian power9): + + splat_dup_l_0: + mfvsrld 9,34 + mtvsrdd 34,9,9 + blr + +Now it generates: + + splat_dup_l_0: + xxpermdi 34,34,34,3 + blr + +2025-07-30 Michael Meissner <meiss...@linux.ibm.com> + +gcc/ + + PR target/99293 + * config/rs6000/vsx.md (vsx_splat_extract_<mode>): New insn. + +gcc/testsuite/ + + PR target/99293 + * gcc.target/powerpc/builtins-1.c: Adjust insn count. + * gcc.target/powerpc/pr99293.c: New test. + +==================== Branch work217-bugs, patch #100 ==================== + +Add power9 and power10 float to logical optimizations. + +I was answering an email from a co-worker and I pointed him to work I had done +for the Power8 era that optimizes the 32-bit float math library in Glibc. In +doing so, I discovered with the Power9 and later computers, this optimization is +no longer taking place. + +The glibc 32-bit floating point math functions have code that looks like: + + union u { + float f; + uint32_t u32; + }; + + float + math_foo (float x, unsigned int mask) + { + union u arg; + float x2; + + arg.f = x; + arg.u32 &= mask; + + x2 = arg.f; + /* ... */ + } + +On power8 with the optimization it generates: + + xscvdpspn 0,1 + sldi 9,4,32 + mtvsrd 32,9 + xxland 1,0,32 + xscvspdpn 1,1 + +I.e., it converts the SFmode to the memory format (instead of the DFmode that is +used within the register), converts the mask so that it is in the vector +register in the upper 32-bits, and does a XXLAND (i.e. there is only one direct +move from GPR to vector register). Then after doing this, it converts the +upper 32-bits back to DFmode. + +If the XSCVSPDN instruction took the value in the normal 32-bit scalar in a +vector register, we wouldn't have needed the SLDI of the mask. + +On power9/power10/power11 it currently generates: + + xscvdpspn 0,1 + mfvsrwz 2,0 + and 2,2,4 + mtvsrws 1,2 + xscvspdpn 1,1 + blr + +I.e convert to SFmode representation, move the value to a GPR, do an AND +operation, move the 32-bit value with a splat, and then convert it back to +DFmode format. + +With this patch, it now generates: + + xscvdpspn 0,1 + mtvsrwz 32,2 + xxland 32,0,32 + xxspltw 1,32,1 + xscvspdpn 1,1 + blr + +I.e. convert to SFmode representation, move the mask to the vector register, do +the operation using XXLAND. Splat the value to get the value in the correct +location, and then convert back to DFmode. + +I have built GCC with the patches in this patch set applied on both little and +big endian PowerPC systems and there were no regressions. Can I apply +this patch to the trunk? + +2025-07-30 Michael Meissner <meiss...@linux.ibm.com> + +gcc/ + + PR target/117487 + * config/rs6000/vsx.md (SFmode logical peephoole): Update comments in + the original code that supports power8. Add a new define_peephole2 to + do the optimization on power9/power10. + +gcc/testsuite/ + + PR target/117487 + * gcc.target/powerpc/pr117487.c: New test. + ==================== Branch work217-bugs, baseline ==================== 2025-07-30 Michael Meissner <meiss...@linux.ibm.com>