On Mon, Aug 22, 2016 at 4:31 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Fri, Aug 19, 2016 at 4:45 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >> On Fri, Aug 19, 2016 at 2:21 AM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Thu, Aug 18, 2016 at 5:16 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >>>> On Thu, Aug 18, 2016 at 1:18 AM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> On Wed, Aug 17, 2016 at 10:11 PM, H.J. Lu <hongjiu...@intel.com> wrote: >>>>>> builtin_memset_gen_str returns a register used for memset, which only >>>>>> supports integer registers. But a target may use vector registers in >>>>>> memmset. This patch adds a TARGET_GEN_MEMSET_VALUE hook to duplicate >>>>>> QImode value to mode derived from STORE_MAX_PIECES, which can be used >>>>>> with vector instructions. The default hook is the same as the original >>>>>> builtin_memset_gen_str. A target can override it to support vector >>>>>> instructions for STORE_MAX_PIECES. >>>>>> >>>>>> Tested on x86-64 and i686. Any comments? >>>>>> >>>>>> H.J. >>>>>> --- >>>>>> gcc/ >>>>>> >>>>>> * builtins.c (builtin_memset_gen_str): Call >>>>>> targetm.gen_memset_value. >>>>>> (default_gen_memset_value): New function. >>>>>> * target.def (gen_memset_value): New hook. >>>>>> * targhooks.c: Inclue "expmed.h" and "builtins.h". >>>>>> (default_gen_memset_value): New function. >>>>> >>>>> I see default_gen_memset_value in builtins.c but it belongs here. >>>>> >>>>>> * targhooks.h (default_gen_memset_value): New prototype. >>>>>> * config/i386/i386.c (ix86_gen_memset_value): New function. >>>>>> (TARGET_GEN_MEMSET_VALUE): New. >>>>>> * config/i386/i386.h (STORE_MAX_PIECES): Likewise. >>>>>> * doc/tm.texi.in: Add TARGET_GEN_MEMSET_VALUE hook. >>>>>> * doc/tm.texi: Updated. >>>>>> >>>> >>>> Like this? >>> >>> Aww, ok - I see builtins.c is a better place - sorry for the extra work. >>> >>> Richard. >> >> Here is the original patch with the updated ChangeLog. OK for trunk? > > So now actually looking at what you are doing. Why do we need a new > hook? Is it that even if we make builtin_memset_gen_str handle vector-mode > MODE properly you don't get the desired optimized code? Your patch doesn't
It is hard to say what codes builtin_memset_gen_str will generate if vector mode is supported. I suspect since broadcast instructions are target specific, a target hook may still be needed to broadcast a QImode input to a full vector register. > seem to change the mode to vector but only STORE_MAX_PIECES and > op_by_pieces_d::run will only consider integer modes. My patch doesn't change mode of STORE_MAX_PIECES. It changes how TImode register from STORE_MAX_PIECES is generated: (insn 8 7 9 (set (reg:SI 90) (zero_extend:SI (reg:QI 89))) c1.i:4 -1 (nil)) (insn 9 8 10 (parallel [ (set (reg:SI 91) (mult:SI (reg:SI 90) (const_int 16843009 [0x1010101]))) (clobber (reg:CC 17 flags)) ]) c1.i:4 -1 (nil)) (insn 10 9 11 (set (reg:V4SI 92) (vec_duplicate:V4SI (reg:SI 91))) c1.i:4 4197 {*vec_dupv4si} (nil)) (insn 11 10 12 (set (subreg:V4SI (reg:TI 93) 0) (reg:V4SI 92)) c1.i:4 -1 (nil)) (insn 12 11 13 (set (mem:TI (reg/v/f:DI 87 [ dst ]) [0 MEM[(void *)dst_2(D)]+0 S16 A8]) (reg:TI 93)) c1.i:4 -1 (nil)) vs. (insn 8 7 9 (set (reg:DI 91) (zero_extend:DI (reg:QI 89))) c1.i:4 -1 (nil)) (insn 9 8 10 (set (subreg:DI (reg:TI 90) 0) (reg:DI 91)) c1.i:4 -1 (nil)) (insn 10 9 11 (set (subreg:DI (reg:TI 90) 8) (const_int 0 [0])) c1.i:4 -1 (nil)) (insn 11 10 12 (set (reg:DI 93) (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1 (nil)) (insn 12 11 13 (parallel [ (set (reg:DI 92) (mult:DI (subreg:DI (reg:TI 90) 8) (reg:DI 93))) (clobber (reg:CC 17 flags)) ]) c1.i:4 -1 (nil)) (insn 13 12 14 (set (reg:DI 95) (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1 (nil)) (insn 14 13 15 (parallel [ (set (reg:DI 94) (mult:DI (subreg:DI (reg:TI 90) 0) (reg:DI 95))) (clobber (reg:CC 17 flags)) ]) c1.i:4 -1 (nil)) (insn 15 14 16 (parallel [ (set (reg:DI 96) (plus:DI (reg:DI 92) (reg:DI 94))) (clobber (reg:CC 17 flags)) ]) c1.i:4 -1 (nil)) (insn 16 15 17 (set (reg:DI 98) (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1 (nil)) (insn 17 16 18 (parallel [ (set (reg:TI 97) (mult:TI (zero_extend:TI (subreg:DI (reg:TI 90) 0)) (zero_extend:TI (reg:DI 98)))) (clobber (reg:CC 17 flags)) ]) c1.i:4 -1 (nil)) (insn 18 17 19 (parallel [ (set (reg:DI 99) (plus:DI (reg:DI 96) (subreg:DI (reg:TI 97) 8))) (clobber (reg:CC 17 flags)) ]) c1.i:4 -1 (nil)) (insn 19 18 20 (set (subreg:DI (reg:TI 97) 8) (reg:DI 99)) c1.i:4 -1 (nil)) (insn 20 19 21 (set (reg:TI 97) (reg:TI 97)) c1.i:4 -1 (expr_list:REG_EQUAL (mult:TI (reg:TI 90) (const_wide_int 0x1010101010101010101010101010101)) (nil))) (insn 21 20 22 (set (mem:TI (reg/v/f:DI 87 [ dst ]) [0 MEM[(void *)dst_2(D)]+0 S16 A8]) (reg:TI 97)) c1.i:4 -1 (nil)) > So isn't this a general missed optimization that TImode 0x01010101...01 * x > is not optimized to a pxor or broadcast? In general, vector instructions aren't supported in integer computation. TImode 0x01010101...01 * x will use integer registers not vector registers. My target hook simply allows vector registers to be used in TImode for STORE_MAX_PIECES. -- H.J.