On Tue, Jun 06, 2017 at 09:35:10AM +0100, Kyrill Tkachov wrote: > Hi all, > > Another vec_merge simplification that's missing is transforming: > (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N)) > into > (vec_concat x z) if N == 1 (0b01) or > (vec_concat y x) if N == 2 (0b10) > > For the testcase in this patch on aarch64 this allows us to try matching > during combine the pattern: > (set (reg:V2DI 78 [ x ]) > (vec_concat:V2DI > (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64]) > (mem:DI (plus:DI (reg/v/f:DI 76 [ y ]) > (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 8B]+0 > S8 A64]))) > > rather than the more complex: > (set (reg:V2DI 78 [ x ]) > (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76 [ y ]) > (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + > 8B]+0 S8 A64])) > (vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 > A64])) > (const_int 2 [0x2]))) > > We don't actually have an aarch64 pattern for the simplified version above, > but it's a simple enough > form to add, so this patch adds such a pattern that performs a concatenated > load of two 64-bit vectors > in adjacent memory locations as a single Q-register LDR. The new aarch64 > pattern is needed to demonstrate > the effectiveness of the simplify-rtx change, so I've kept them together as > one patch. > > Now for the testcase in the patch we can generate: > construct_lanedi: > ldr q0, [x0] > ret > > construct_lanedf: > ldr q0, [x0] > ret > > instead of: > construct_lanedi: > ld1r {v0.2d}, [x0] > ldr x0, [x0, 8] > ins v0.d[1], x0 > ret > > construct_lanedf: > ld1r {v0.2d}, [x0] > ldr d1, [x0, 8] > ins v0.d[1], v1.d[0] > ret > > The new memory constraint Utq is needed because we need to allow only the > Q-register addressing modes but > the MEM expressions in the RTL pattern have 64-bit vector modes, and if we > don't constrain them they will > allow the D-register addressing modes during register allocation/address mode > selection, which will produce > invalid assembly. > > Bootstrapped and tested on aarch64-none-linux-gnu. > Ok for trunk?
I'd appreciate a more thorough set of tests, looking at some of the vector mode combines that you now permit. I'm (only a little) nervous that you only test here for the DI/DFmode cases and that there is no testing for V2SI etc., nor tests for strict align, nor tests for when the addesses must block the transformation. The patch itself looks OK, but it could do with better tests. Thanks, James