On Mon, Aug 22, 2016 at 4:31 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Fri, Aug 19, 2016 at 4:45 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>> On Fri, Aug 19, 2016 at 2:21 AM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On Thu, Aug 18, 2016 at 5:16 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>> On Thu, Aug 18, 2016 at 1:18 AM, Richard Biener
>>>> <richard.guent...@gmail.com> wrote:
>>>>> On Wed, Aug 17, 2016 at 10:11 PM, H.J. Lu <hongjiu...@intel.com> wrote:
>>>>>> builtin_memset_gen_str returns a register used for memset, which only
>>>>>> supports integer registers.  But a target may use vector registers in
>>>>>> memmset.  This patch adds a TARGET_GEN_MEMSET_VALUE hook to duplicate
>>>>>> QImode value to mode derived from STORE_MAX_PIECES, which can be used
>>>>>> with vector instructions.  The default hook is the same as the original
>>>>>> builtin_memset_gen_str.  A target can override it to support vector
>>>>>> instructions for STORE_MAX_PIECES.
>>>>>>
>>>>>> Tested on x86-64 and i686.  Any comments?
>>>>>>
>>>>>> H.J.
>>>>>> ---
>>>>>> gcc/
>>>>>>
>>>>>>         * builtins.c (builtin_memset_gen_str): Call 
>>>>>> targetm.gen_memset_value.
>>>>>>         (default_gen_memset_value): New function.
>>>>>>         * target.def (gen_memset_value): New hook.
>>>>>>         * targhooks.c: Inclue "expmed.h" and "builtins.h".
>>>>>>         (default_gen_memset_value): New function.
>>>>>
>>>>> I see default_gen_memset_value in builtins.c but it belongs here.
>>>>>
>>>>>>         * targhooks.h (default_gen_memset_value): New prototype.
>>>>>>         * config/i386/i386.c (ix86_gen_memset_value): New function.
>>>>>>         (TARGET_GEN_MEMSET_VALUE): New.
>>>>>>         * config/i386/i386.h (STORE_MAX_PIECES): Likewise.
>>>>>>         * doc/tm.texi.in: Add TARGET_GEN_MEMSET_VALUE hook.
>>>>>>         * doc/tm.texi: Updated.
>>>>>>
>>>>
>>>> Like this?
>>>
>>> Aww, ok - I see builtins.c is a better place - sorry for the extra work.
>>>
>>> Richard.
>>
>> Here is the original patch with the updated ChangeLog.  OK for trunk?
>
> So now actually looking at what you are doing.  Why do we need a new
> hook?  Is it that even if we make builtin_memset_gen_str handle vector-mode
> MODE properly you don't get the desired optimized code?  Your patch doesn't

It is hard to say what codes builtin_memset_gen_str will generate
if vector mode is supported.  I suspect since broadcast instructions are
target specific, a target hook may still be needed to broadcast a QImode
input to a full vector register.

> seem to change the mode to vector but only STORE_MAX_PIECES and
> op_by_pieces_d::run will only consider integer modes.

My patch doesn't change mode of STORE_MAX_PIECES.  It changes
how TImode register from STORE_MAX_PIECES is generated:

(insn 8 7 9 (set (reg:SI 90)
        (zero_extend:SI (reg:QI 89))) c1.i:4 -1
     (nil))

(insn 9 8 10 (parallel [
            (set (reg:SI 91)
                (mult:SI (reg:SI 90)
                    (const_int 16843009 [0x1010101])))
            (clobber (reg:CC 17 flags))
        ]) c1.i:4 -1
     (nil))

(insn 10 9 11 (set (reg:V4SI 92)
        (vec_duplicate:V4SI (reg:SI 91))) c1.i:4 4197 {*vec_dupv4si}
     (nil))

(insn 11 10 12 (set (subreg:V4SI (reg:TI 93) 0)
        (reg:V4SI 92)) c1.i:4 -1
     (nil))

(insn 12 11 13 (set (mem:TI (reg/v/f:DI 87 [ dst ]) [0 MEM[(void
*)dst_2(D)]+0 S16
A8])
        (reg:TI 93)) c1.i:4 -1
     (nil))

vs.

(insn 8 7 9 (set (reg:DI 91)
        (zero_extend:DI (reg:QI 89))) c1.i:4 -1
     (nil))

(insn 9 8 10 (set (subreg:DI (reg:TI 90) 0)
        (reg:DI 91)) c1.i:4 -1
     (nil))

(insn 10 9 11 (set (subreg:DI (reg:TI 90) 8)
        (const_int 0 [0])) c1.i:4 -1
     (nil))

(insn 11 10 12 (set (reg:DI 93)
        (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1
     (nil))

(insn 12 11 13 (parallel [
            (set (reg:DI 92)
                (mult:DI (subreg:DI (reg:TI 90) 8)
                    (reg:DI 93)))
            (clobber (reg:CC 17 flags))
        ]) c1.i:4 -1
     (nil))

(insn 13 12 14 (set (reg:DI 95)
        (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1
     (nil))

(insn 14 13 15 (parallel [
            (set (reg:DI 94)
                (mult:DI (subreg:DI (reg:TI 90) 0)
                    (reg:DI 95)))
            (clobber (reg:CC 17 flags))
        ]) c1.i:4 -1
     (nil))

(insn 15 14 16 (parallel [
            (set (reg:DI 96)
                (plus:DI (reg:DI 92)
                    (reg:DI 94)))
            (clobber (reg:CC 17 flags))
        ]) c1.i:4 -1
     (nil))

(insn 16 15 17 (set (reg:DI 98)
        (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1
     (nil))

(insn 17 16 18 (parallel [
            (set (reg:TI 97)
                (mult:TI (zero_extend:TI (subreg:DI (reg:TI 90) 0))
                    (zero_extend:TI (reg:DI 98))))
            (clobber (reg:CC 17 flags))
        ]) c1.i:4 -1
     (nil))

(insn 18 17 19 (parallel [
            (set (reg:DI 99)
                (plus:DI (reg:DI 96)
                    (subreg:DI (reg:TI 97) 8)))
            (clobber (reg:CC 17 flags))
        ]) c1.i:4 -1
     (nil))

(insn 19 18 20 (set (subreg:DI (reg:TI 97) 8)
        (reg:DI 99)) c1.i:4 -1
     (nil))

(insn 20 19 21 (set (reg:TI 97)
        (reg:TI 97)) c1.i:4 -1
     (expr_list:REG_EQUAL (mult:TI (reg:TI 90)
            (const_wide_int 0x1010101010101010101010101010101))
        (nil)))

(insn 21 20 22 (set (mem:TI (reg/v/f:DI 87 [ dst ]) [0 MEM[(void
*)dst_2(D)]+0 S16
A8])
        (reg:TI 97)) c1.i:4 -1
     (nil))


> So isn't this a general missed optimization that TImode 0x01010101...01 * x
> is not optimized to a pxor or broadcast?

In general, vector instructions aren't supported in integer computation.
TImode 0x01010101...01 * x will use integer registers not  vector registers.
My target hook simply allows vector registers to be used in TImode for
STORE_MAX_PIECES.

-- 
H.J.

Reply via email to