https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71395

            Bug ID: 71395
           Summary: PowerPC vec_init of 4 SFmode values could be improved
                    on Power8
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

The code for combining 4 SFmode values into a V4SFmode could be improved in
GCC.

For example:

#include <altivec.h>

vector combine (float a, float b, float c, float d)
{
  return (vector float) { a, b, c, d };
}

Generates:

        .file   "foo.c"
        .section        ".text"
        .align 2
        .p2align 4,,15
        .globl merge
        .section        ".opd","aw"
        .align 3
merge:
        .quad   .L.merge,.TOC.@tocbase,0
        .previous
        .type   merge, @function
.L.merge:
        addis 9,2,.LC0@toc@ha
        xxpermdi 34,2,1,0
        xxpermdi 32,4,3,0
        addi 9,9,.LC0@toc@l
        xvcvdpsp 32,32
        xvcvdpsp 34,34
        lxvd2x 33,0,9
        xxpermdi 33,33,33,2
        vperm 2,0,2,1
        blr
        .long 0
        .byte 0,0,0,0,0,0,0,0
        .size   merge,.-.L.merge
        .section        .rodata.cst16,"aM",@progbits,16
        .align 4
.LC0:
        .byte   31
        .byte   30
        .byte   29
        .byte   28
        .byte   23
        .byte   22
        .byte   21
        .byte   20
        .byte   15
        .byte   14
        .byte   13
        .byte   12
        .byte   7
        .byte   6
        .byte   5
        .byte   4

If you build the 2 V2DF temporaries differently, you could use the VMRGEW and
VMRGOW instructions to do the final combination instead of loading up a permute
mask and doing a VPERM instruction.

Reply via email to