https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81594
--- Comment #6 from Michael Meissner <meissner at gcc dot gnu.org> --- If you look at the original patch, it did try to do this optimization. When I looked at it some time later, the combiner no longer generated the sequence because it thought it was slower (due to length, etc.). You could spend a lot of time tuning the code so eventually the combiner will generate it again, but it was simpler to just put the peephole in to catch the cases that show up. If you want to take on the bug and do it earlier, go ahead. A peephole2 might not catch all uses, but it prevents whack-a-mole, where a change causes other code generation changes down the pike. Note, the original patch was written in the power8 time frame, and it would need to be adjust to power9 and future systems now (i.e. the patch only does the splitting if the value is a FPR or GPR, while in power9 it could be a traditional Altivec register). However, the splitter uses reload_completed that you always seem to object to. It could be done before register allocation, but then you would need to make sure that no other pass recombines the two separate items back into a vector once again.