Hi Christophe,
On 15/11/17 15:31, Christophe Lyon wrote:
Hi Kyrill,
On 8 November 2017 at 19:34, Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com> wrote:
On 06/06/17 14:17, James Greenhalgh wrote:
On Tue, Jun 06, 2017 at 09:40:44AM +0100, Kyrill Tkachov wrote:
Hi all,
On top of the previous vec_merge simplifications [1] we can add this
pattern to perform
a store of a vec_concat of two 64-bit values in distinct registers as an
STP.
This avoids constructing such a vector explicitly in a register and
storing it as
a Q register.
This way for the code in the testcase we can generate:
construct_lane_1:
ldp d1, d0, [x0]
fmov d3, 1.0e+0
fmov d2, 2.0e+0
fadd d4, d1, d3
fadd d5, d0, d2
stp d4, d5, [x1, 32]
ret
construct_lane_2:
ldp x2, x0, [x0]
add x3, x2, 1
add x4, x0, 2
stp x3, x4, [x1, 32]
ret
instead of the current:
construct_lane_1:
ldp d0, d1, [x0]
fmov d3, 1.0e+0
fmov d2, 2.0e+0
fadd d0, d0, d3
fadd d1, d1, d2
dup v0.2d, v0.d[0]
ins v0.d[1], v1.d[0]
str q0, [x1, 32]
ret
construct_lane_2:
ldp x2, x3, [x0]
add x0, x2, 1
add x2, x3, 2
dup v0.2d, x0
ins v0.d[1], x2
str q0, [x1, 32]
ret
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for GCC 8?
OK.
Thanks, I've committed this and the other patches in this series after
rebasing and rebootstrapping and testing on aarch64-none-linux-gnu.
The only conflict from updating the patch was that I had to use the store_16
attribute rather than
the old store2 for the new define_insn. This is what I've committed with
r254551.
Sorry for the delay in committing.
I've noticed that the new tests fail when testing with -mabi=ilp32:
FAIL: gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-not ins\t
FAIL: gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-times
stp\td[0-9]+, d[0-9]+ 1 (found 0 times)
FAIL: gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-times
stp\tx[0-9]+, x[0-9]+ 1 (found 0 times)
Sorry if this has been reported before.
Thank you for reporting this, I was not aware of it.
My patch does indeed fail to generate the optimised sequence for
-mabi=ilp32.
During combine it fails to match:
Failed to match this instruction:
(set (mem:V2DF (plus:DI (reg/v/f:DI 79 [ z ])
(const_int 32 [0x20])) [1 MEM[(v2df *)z_8(D) + 32B]+0 S16
A128])
(vec_concat:V2DF (reg:DF 81 [ y0 ])
(reg:DF 84 [ y1 ])))
but without the -mabi=ilp32 it does successfully match the equivalent
(set (mem:V2DF (plus:DI (reg:DI 1 x1 [ z ])
(const_int 32 [0x20])) [1 MEM[(v2df *)z_8(D) + 32B]+0 S16
A128])
(vec_concat:V2DF (reg:DF 81 [ y0 ])
(reg:DF 84 [ y1 ])))
The only difference is the index register being the hard reg x1.
There's probably some subtlety in aarch64_classify_address that I'll
need to dig into.
In any case, can you please open a bug report for this so we can track it?
To be clear, the failure is just suboptimal codegen for the -mabi=ilp32
case, not a wrong-code or ICE
(though it should still be fixed, of course).
Thanks again,
Kyrill
Christophe
Kyrill
Thanks,
James
2017-06-06 Kyrylo Tkachov <kyrylo.tkac...@arm.com>
* config/aarch64/aarch64-simd.md (store_pair_lanes<mode>):
New pattern.
* config/aarch64/constraints.md (Uml): New constraint.
* config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): New
predicate.
2017-06-06 Kyrylo Tkachov <kyrylo.tkac...@arm.com>
* gcc.target/aarch64/store_v2vec_lanes.c: New test.