On Sun, Nov 4, 2018 at 8:17 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Sun, Nov 4, 2018 at 8:41 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > On Fri, Nov 2, 2018 at 6:25 PM H.J. Lu <hongjiu...@intel.com> wrote: > > > > > > Remove duplicated AVX2/AVX512 vec_dup patterns and replace them with > > > subreg. gcc.target/i386/avx2-vbroadcastss_ps256-1.c is changed by > > > > > > avx2_test: > > > .cfi_startproc > > > - vmovaps x(%rip), %xmm1 > > > - vbroadcastss %xmm1, %ymm0 > > > + vbroadcastss x(%rip), %ymm0 > > > vmovaps %ymm0, y(%rip) > > > vzeroupper > > > ret > > > .cfi_endproc > > > > > > gcc.target/i386/avx512vl-vbroadcast-3.c is changed by > > > > > > @@ -113,7 +113,7 @@ f10: > > > .cfi_startproc > > > vmovaps %ymm0, %ymm16 > > > vpermilps $85, %ymm16, %ymm16 > > > - vbroadcastss %xmm16, %ymm16 > > > + vshuff32x4 $0x0, %ymm16, %ymm16, %ymm16 > > > vzeroupper > > > ret > > > .cfi_endproc > > > @@ -153,8 +153,7 @@ f12: > > > f13: > > > .LFB12: > > > .cfi_startproc > > > - vmovaps (%rdi), %ymm16 > > > - vbroadcastss %xmm16, %ymm16 > > > + vbroadcastss (%rdi), %ymm16 > > > vzeroupper > > > ret > > > .cfi_endproc > > > > Actually, we can achieve the same with pre-reload splitters. Please > > see the attached patch for a couple of examples and a fix for > > vbroadcastss that accesses the memory in wrong mode. > > > > My patch removes a bunch of duplicated patterns from sse.md. But > yours adds a couple more patterns. Isn't fewer patterns preferred?
Playing SUBREG games before reload does not look safe to me. We would like to create a simpler instruction out of the combination of vector load and broadcast, so I think that combine+split is the right tool for this simplification. BTW: Half of my proposed patch is a fix to a avx2_pbroadcast<mode>{_1} pattern, which models wrong access to memory. Uros.