On 31/10/16 11:54, Kyrill Tkachov wrote: > > On 24/10/16 17:15, Andrew Pinski wrote: >> On Mon, Oct 24, 2016 at 7:27 AM, Kyrill Tkachov >> <kyrylo.tkac...@foss.arm.com> wrote: >>> Hi all, >>> >>> When storing a 64-bit immediate that has equal bottom and top halves we >>> currently >>> synthesize the repeating 32-bit pattern twice and perform a single >>> X-store. >>> With this patch we synthesize the 32-bit pattern once into a W >>> register and >>> store >>> that twice using an STP. This reduces codesize bloat from >>> synthesising the >>> same >>> constant multiple times at the expense of converting a store to a >>> store-pair. >>> It will only trigger if we can save two or more instructions, so it will >>> only transform: >>> mov x1, 49370 >>> movk x1, 0xc0da, lsl 32 >>> str x1, [x0] >>> >>> into: >>> >>> mov w1, 49370 >>> stp w1, w1, [x0] >>> >>> when optimising for -Os, whereas it will always transform a 4-insn >>> synthesis >>> sequence into a two-insn sequence + STP (see comments in the patch). >>> >>> This patch triggers already but will trigger more with the store merging >>> pass >>> that I'm working on since that will generate more of these repeating >>> 64-bit >>> constants. >>> This helps improve codegen on 456.hmmer where store merging can >>> sometimes >>> create very >>> complex repeating constants and target-specific expand needs to break >>> them >>> down. >> >> Doing STP might be worse on ThunderX 1 than the mov/movk. Or this >> might cause an ICE with -mcpu=thunderx; I can't remember if the check >> for slow unaligned store pair word is with the pattern or not. > > I can't get it to ICE with -mcpu=thunderx. > The restriction is just on the STP forming code in the sched-fusion > hooks AFAIK. > >> Basically the rule is >> 1) if 4 byte aligned, then it is better to do two str. >> 2) If 8 byte aligned, then doing stp is good >> 3) Otherwise it is better to do two str. > > Ok, then I'll make the function just emit two stores and depend on the > sched-fusion > machinery to fuse them into an STP when appropriate since that has the > logic that > takes thunderx into account.
If the mode is DImode (ie the pattern is 'movdi', then surely we must have a 64-bit aligned store. R. > > Thanks for the info. > Kyrill > >> >> Thanks, >> Andrew >> >> >>> Bootstrapped and tested on aarch64-none-linux-gnu. >>> >>> Ok for trunk? >>> >>> Thanks, >>> Kyrill >>> >>> 2016-10-24 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> >>> * config/aarch64/aarch64.md (mov<mode>): Call >>> aarch64_split_dimode_const_store on DImode constant stores. >>> * config/aarch64/aarch64-protos.h >>> (aarch64_split_dimode_const_store): >>> New prototype. >>> * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New >>> function. >>> >>> 2016-10-24 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> >>> * gcc.target/aarch64/store_repeating_constant_1.c: New test. >>> * gcc.target/aarch64/store_repeating_constant_2.c: Likewise. >