Hi,

Recent refactoring of the arm_neon.h header enabled better code
generation for intrinsics that manipulate vector structures. New
tests were also added to verify the benefit of these changes. It now
transpires that the code generation improvements are observed only on
little-endian systems. This patch restricts the code generation tests
to little-endian targets (for now.)

Ok for master?

Thanks,
Jonathan

---

gcc/testsuite/ChangeLog:

2021-08-04  Jonathan Wright  <jonathan.wri...@arm.com>

        * gcc.target/aarch64/vector_structure_intrinsics.c: Restrict
        tests to little-endian targets.



From: Christophe Lyon <christophe.lyon....@gmail.com>
Sent: 03 August 2021 10:42
To: Jonathan Wright <jonathan.wri...@arm.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Richard Sandiford 
<richard.sandif...@arm.com>
Subject: Re: [PATCH 1/8] aarch64: Use memcpy to copy vector tables in 
vqtbl[234] intrinsics 
 


On Fri, Jul 23, 2021 at 10:22 AM Jonathan Wright via Gcc-patches 
<gcc-patches@gcc.gnu.org> wrote:
Hi,

This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vqtbl[234] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vqtbl[234] intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-08  Jonathan Wright  <jonathan.wri...@arm.com>

        * config/aarch64/arm_neon.h (vqtbl2_s8): Use __builtin_memcpy
        instead of constructing __builtin_aarch64_simd_oi one vector
        at a time.
        (vqtbl2_u8): Likewise.
        (vqtbl2_p8): Likewise.
        (vqtbl2q_s8): Likewise.
        (vqtbl2q_u8): Likewise.
        (vqtbl2q_p8): Likewise.
        (vqtbl3_s8): Use __builtin_memcpy instead of constructing
        __builtin_aarch64_simd_ci one vector at a time.
        (vqtbl3_u8): Likewise.
        (vqtbl3_p8): Likewise.
        (vqtbl3q_s8): Likewise.
        (vqtbl3q_u8): Likewise.
        (vqtbl3q_p8): Likewise.
        (vqtbl4_s8): Use __builtin_memcpy instead of constructing
        __builtin_aarch64_simd_xi one vector at a time.
        (vqtbl4_u8): Likewise.
        (vqtbl4_p8): Likewise.
        (vqtbl4q_s8): Likewise.
        (vqtbl4q_u8): Likewise.
        (vqtbl4q_p8): Likewise.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/vector_structure_intrinsics.c: New test.

Hi,

This new test fails on aarch64_be:
 FAIL: gcc.target/aarch64/vector_structure_intrinsics.c scan-assembler-not 
mov\\t

Can you check?

Thanks

Christophe

Attachment: rb14749.patch
Description: rb14749.patch

Reply via email to