On Mon, May 31, 2021 at 8:33 PM H.J. Lu via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Mon, May 31, 2021 at 11:13 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > On Mon, May 31, 2021 at 11:07 AM Jeff Law <jeffreya...@gmail.com> wrote: > > > > > > > > > > > > On 5/31/2021 6:04 AM, H.J. Lu wrote: > > > > On Sun, May 30, 2021 at 11:49 AM Jeff Law <jeffreya...@gmail.com> wrote: > > > >> > > > >> > > > >> On 5/11/2021 5:35 PM, H.J. Lu via Gcc-patches wrote: > > > >>> Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support > > > >>> target instructions to duplicate QImode value to TImode/OImode/XImode > > > >>> value for memmset. Define SCRATCH_SSE_REG as a scratch register for > > > >>> ix86_gen_memset_value. > > > >>> > > > >>> gcc/ > > > >>> > > > >>> PR middle-end/90773 > > > >>> * builtins.c (builtin_memset_read_str): Call > > > >>> targetm.read_memset_value. > > > >>> (builtin_memset_gen_str): Call targetm.gen_memset_value. > > > >>> * target.def (read_memset_value): New hook. > > > >>> (gen_memset_value): Likewise. > > > >>> * targhooks.c: Inclue "builtins.h". > > > >>> (default_read_memset_value): New function. > > > >>> (default_gen_memset_value): Likewise. > > > >>> * targhooks.h (default_read_memset_value): New prototype. > > > >>> (default_gen_memset_value): Likewise. > > > >>> * config/i386/i386-expand.c > > > >>> (ix86_expand_vector_init_duplicate): > > > >>> Make it global. > > > >>> * config/i386/i386-protos.h > > > >>> (ix86_minimum_incoming_stack_boundary): > > > >>> New. > > > >>> (ix86_expand_vector_init_duplicate): Likewise. > > > >>> * config/i386/i386.c (ix86_minimum_incoming_stack_boundary): > > > >>> Add > > > >>> an argument to ignore stack_alignment_estimated. It is passed > > > >>> as false by default. > > > >>> (ix86_gen_memset_value_from_prev): New function. > > > >>> (ix86_gen_memset_value): Likewise. > > > >>> (ix86_read_memset_value): Likewise. > > > >>> (TARGET_GEN_MEMSET_VALUE): New. > > > >>> (TARGET_READ_MEMSET_VALUE): Likewise. > > > >>> * config/i386/i386.h (SCRATCH_SSE_REG): New. > > > >>> * doc/tm.texi.in: Add TARGET_READ_MEMSET_VALUE and > > > >>> TARGET_GEN_MEMSET_VALUE hooks. > > > >>> * doc/tm.texi: Regenerated. > > > >>> > > > >>> gcc/testsuite/ > > > >>> > > > >>> PR middle-end/90773 > > > >>> * gcc.target/i386/pr90773-15.c: New test. > > > >>> * gcc.target/i386/pr90773-16.c: Likewise. > > > >>> * gcc.target/i386/pr90773-17.c: Likewise. > > > >>> * gcc.target/i386/pr90773-18.c: Likewise. > > > >>> * gcc.target/i386/pr90773-19.c: Likewise. > > > >> Why does this need target hooks? ISTM the right way to go here is to > > > >> just emit the constant load to the target register and let the target > > > >> figure out how best to construct the constant into the register. If > > > >> that means load it via QImode and broadcast, that's fine, but I'm not > > > >> sure why that's not all implemented in the target files. > > > >> > > > > I will submit a patch to add optabs instead. > > > I may be missing something, but I'm not even sure why we need special > > > optabs. > > > > > > Aren't you just trying to efficiently get a constant element broadcast > > > across an entire vector? > > > > Since vec_duplicate must not fail and for broadcast from a constant QImode > > value, vec_duplicate may not be faster than a compile-time constant, I am > > adding vec_const_duplicate. If vec_duplicate can fail, I don't need > > vec_const_duplicate. > > > > -- > > H.J. > > > For > > extern void *ops; > > void > foo (int c) > { > __builtin_memset (ops, 4, 32); > } > > without vec_const_duplicate, I got > > movl $4, %eax > movq ops(%rip), %rdx > movd %eax, %xmm0 > punpcklbw %xmm0, %xmm0 > punpcklwd %xmm0, %xmm0 > pshufd $0, %xmm0, %xmm0 > movups %xmm0, (%rdx) > movups %xmm0, 16(%rdx) > ret > > with vec_const_duplicate, I got > > movq ops(%rip), %rax > movdqa .LC0(%rip), %xmm0 > movups %xmm0, (%rax) > movups %xmm0, 16(%rax) > ret
But you can construct the duplicated constant at compile-time? I thought the issue was that a constant pool load is _not_ the most efficient variant? > > -- > H.J.