https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71395
Bug ID: 71395 Summary: PowerPC vec_init of 4 SFmode values could be improved on Power8 Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: meissner at gcc dot gnu.org Target Milestone: --- The code for combining 4 SFmode values into a V4SFmode could be improved in GCC. For example: #include <altivec.h> vector combine (float a, float b, float c, float d) { return (vector float) { a, b, c, d }; } Generates: .file "foo.c" .section ".text" .align 2 .p2align 4,,15 .globl merge .section ".opd","aw" .align 3 merge: .quad .L.merge,.TOC.@tocbase,0 .previous .type merge, @function .L.merge: addis 9,2,.LC0@toc@ha xxpermdi 34,2,1,0 xxpermdi 32,4,3,0 addi 9,9,.LC0@toc@l xvcvdpsp 32,32 xvcvdpsp 34,34 lxvd2x 33,0,9 xxpermdi 33,33,33,2 vperm 2,0,2,1 blr .long 0 .byte 0,0,0,0,0,0,0,0 .size merge,.-.L.merge .section .rodata.cst16,"aM",@progbits,16 .align 4 .LC0: .byte 31 .byte 30 .byte 29 .byte 28 .byte 23 .byte 22 .byte 21 .byte 20 .byte 15 .byte 14 .byte 13 .byte 12 .byte 7 .byte 6 .byte 5 .byte 4 If you build the 2 V2DF temporaries differently, you could use the VMRGEW and VMRGOW instructions to do the final combination instead of loading up a permute mask and doing a VPERM instruction.