On Tue, Jun 06, 2017 at 09:35:10AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> Another vec_merge simplification that's missing is transforming:
> (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N))
> into
> (vec_concat x z) if N == 1 (0b01) or
> (vec_concat y x) if N == 2 (0b10)
> 
> For the testcase in this patch on aarch64 this allows us to try matching 
> during combine the pattern:
> (set (reg:V2DI 78 [ x ])
>     (vec_concat:V2DI
>         (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64])
>         (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
>                 (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 8B]+0 
> S8 A64])))
> 
> rather than the more complex:
> (set (reg:V2DI 78 [ x ])
>     (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
>                     (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 
> 8B]+0 S8 A64]))
>         (vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 
> A64]))
>         (const_int 2 [0x2])))
> 
> We don't actually have an aarch64 pattern for the simplified version above, 
> but it's a simple enough
> form to add, so this patch adds such a pattern that performs a concatenated 
> load of two 64-bit vectors
> in adjacent memory locations as a single Q-register LDR. The new aarch64 
> pattern is needed to demonstrate
> the effectiveness of the simplify-rtx change, so I've kept them together as 
> one patch.
> 
> Now for the testcase in the patch we can generate:
> construct_lanedi:
>         ldr     q0, [x0]
>         ret
> 
> construct_lanedf:
>         ldr     q0, [x0]
>         ret
> 
> instead of:
> construct_lanedi:
>         ld1r    {v0.2d}, [x0]
>         ldr     x0, [x0, 8]
>         ins     v0.d[1], x0
>         ret
> 
> construct_lanedf:
>         ld1r    {v0.2d}, [x0]
>         ldr     d1, [x0, 8]
>         ins     v0.d[1], v1.d[0]
>         ret
> 
> The new memory constraint Utq is needed because we need to allow only the 
> Q-register addressing modes but
> the MEM expressions in the RTL pattern have 64-bit vector modes, and if we 
> don't constrain them they will
> allow the D-register addressing modes during register allocation/address mode 
> selection, which will produce
> invalid assembly.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for trunk?

I'd appreciate a more thorough set of tests, looking at some of the vector
mode combines that you now permit. I'm (only a little) nervous that you only
test here for the DI/DFmode cases and that there is no testing for
V2SI etc., nor tests for strict align, nor tests for when the addesses
must block the transformation.

The patch itself looks OK, but it could do with better tests.

Thanks,
James

Reply via email to