On Wed, Jun 9, 2021 at 1:17 AM Hongtao Liu <crazy...@gmail.com> wrote: > > On Wed, Jun 9, 2021 at 2:02 AM H.J. Lu via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > 1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTO > > operands to vector broadcast from an integer with AVX2. > > 2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which > > won't increase stack alignment requirement and blocks transformation by > > the combine pass. > > 3. Update PR 87767 tests to expect integer broadcast instead of broadcast > > from memory. > > 4. Update avx512f_cond_move.c to expect integer broadcast. > > > > A small benchmark: > > > > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast > > > > shows that broadcast is a little bit faster on Intel Core i7-8559U: > > > > $ make > > gcc -g -I. -O2 -c -o test.o test.c > > gcc -g -c -o memory.o memory.S > > gcc -g -c -o broadcast.o broadcast.S > > gcc -g -c -o vec_dup_sse2.o vec_dup_sse2.S > > gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o > > ./test > > memory : 147215 > > broadcast : 121213 > > vec_dup_sse2: 171366 > > $ > > > > broadcast is also smaller: > > > > $ size memory.o broadcast.o > > text data bss dec hex filename > > 132 0 0 132 84 memory.o > > 122 0 0 122 7a broadcast.o > > $ > Only the mov scenario was measured, when it comes to avx512 embedded > broadcast it's 1 avx512 embedded broadcast instruction vs at least 3 > instructions: mov + broadcast + op. I'm not sure which is better? > > take pr87767 for example. > vpaddd .LC1(%rip){1to16}, %zmm0, %zmm0 > .LC1: > .long 3 > > vs > > movl 3, %eax > vpbroadcastd %eax, %zmm1 > vpaddd %zmm1, %zmm0, %zmm0 >
https://gitlab.com/x86-benchmarks/microbenchmark/-/commits/vpaddd/broadcast shows that vpbroadcastd is faster: [hjl@gnu-skx-1 microbenchmark]$ make gcc -g -I. -O2 -march=skylake-avx512 -c -o test.o test.c gcc -g -c -o memory.o memory.S gcc -g -c -o broadcast.o broadcast.S gcc -o test test.o memory.o broadcast.o ./test memory : 425538 broadcast : 375260 [hjl@gnu-skx-1 microbenchmark]$ -- H.J.