On 31/10/16 11:54, Kyrill Tkachov wrote:
> 
> On 24/10/16 17:15, Andrew Pinski wrote:
>> On Mon, Oct 24, 2016 at 7:27 AM, Kyrill Tkachov
>> <kyrylo.tkac...@foss.arm.com> wrote:
>>> Hi all,
>>>
>>> When storing a 64-bit immediate that has equal bottom and top halves we
>>> currently
>>> synthesize the repeating 32-bit pattern twice and perform a single
>>> X-store.
>>> With this patch we synthesize the 32-bit pattern once into a W
>>> register and
>>> store
>>> that twice using an STP. This reduces codesize bloat from
>>> synthesising the
>>> same
>>> constant multiple times at the expense of converting a store to a
>>> store-pair.
>>> It will only trigger if we can save two or more instructions, so it will
>>> only transform:
>>>          mov     x1, 49370
>>>          movk    x1, 0xc0da, lsl 32
>>>          str     x1, [x0]
>>>
>>> into:
>>>
>>>          mov     w1, 49370
>>>          stp     w1, w1, [x0]
>>>
>>> when optimising for -Os, whereas it will always transform a 4-insn
>>> synthesis
>>> sequence into a two-insn sequence + STP (see comments in the patch).
>>>
>>> This patch triggers already but will trigger more with the store merging
>>> pass
>>> that I'm working on since that will generate more of these repeating
>>> 64-bit
>>> constants.
>>> This helps improve codegen on 456.hmmer where store merging can
>>> sometimes
>>> create very
>>> complex repeating constants and target-specific expand needs to break
>>> them
>>> down.
>>
>> Doing STP might be worse on ThunderX 1 than the mov/movk.  Or this
>> might cause an ICE with -mcpu=thunderx; I can't remember if the check
>> for slow unaligned store pair word is with the pattern or not.
> 
> I can't get it to ICE with -mcpu=thunderx.
> The restriction is just on the STP forming code in the sched-fusion
> hooks AFAIK.
> 
>> Basically the rule is
>> 1) if 4 byte aligned, then it is better to do two str.
>> 2) If 8 byte aligned, then doing stp is good
>> 3) Otherwise it is better to do two str.
> 
> Ok, then I'll make the function just emit two stores and depend on the
> sched-fusion
> machinery to fuse them into an STP when appropriate since that has the
> logic that
> takes thunderx into account.

If the mode is DImode (ie the pattern is 'movdi', then surely we must
have a 64-bit aligned store.

R.

> 
> Thanks for the info.
> Kyrill
> 
>>
>> Thanks,
>> Andrew
>>
>>
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>>
>>> Ok for trunk?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>> 2016-10-24  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>
>>>      * config/aarch64/aarch64.md (mov<mode>): Call
>>>      aarch64_split_dimode_const_store on DImode constant stores.
>>>      * config/aarch64/aarch64-protos.h
>>> (aarch64_split_dimode_const_store):
>>>      New prototype.
>>>      * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New
>>>      function.
>>>
>>> 2016-10-24  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>
>>>      * gcc.target/aarch64/store_repeating_constant_1.c: New test.
>>>      * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.
> 

Reply via email to