Hi all,

I finally got around to trying out the define_subst approach for PR 
target/99195.
The problem we have is that many Advanced SIMD instructions have 64-bit vector 
variants that
clear the top half of the 128-bit Q register. This would allow the compiler to 
avoid generating
explicit zeroing instructions to concat the 64-bit result with zeroes for code 
like:
vcombine_u16(vadd_u16(a, b), vdup_n_u16(0))
We've been getting user reports of GCC missing this optimisation in real world 
code, so it's worth
doing something about it.
The straightforward approach that we've been taking so far is adding extra 
patterns in aarch64-simd.md
that match the 64-bit result in a vec_concat with zeroes. Unfortunately for 
big-endian the vec_concat
operands to match must be the other way around, so we would end up adding two 
extra define_insns.
This would lead to too much bloat in aarch64-simd.md.

This patch defines a pair of define_subst constructs that allow us to annotate 
patterns in aarch64-simd.md
with the <vczle> and <vczbe> subst_attrs and the compiler will automatically 
produce the vec_concat widening patterns,
properly gated for BYTES_BIG_ENDIAN when needed. This seems like the least 
intrusive way to describe the extra zeroing semantics.

I've had a look at the generated insn-*.cc files in the build directory and it 
seems that define_subst does what we want it to do
when applied multiple times on a pattern in terms of insn conditions and modes.

This patch adds the define_subst machinery and adds the annotations to some of 
the straightforward binary and unary integer
operations. Many more such annotations are possible, and I aim add them in 
future patches.

Bootstrapped and tested on aarch64-none-linux-gnu and on aarch64_be-none-elf.

Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

        PR target/99195
        * config/aarch64/aarch64-simd.md (add_vec_concat_subst_le): Define.
        (add_vec_concat_subst_be): Likewise.
        (vczle): Likewise.
        (vczbe): Likewise.
        (add<mode>3): Rename to...
        (add<mode>3<vczle><vczbe>): ... This.
        (sub<mode>3): Rename to...
        (sub<mode>3<vczle><vczbe>): ... This.
        (mul<mode>3): Rename to...
        (mul<mode>3<vczle><vczbe>): ... This.
        (and<mode>3): Rename to...
        (and<mode>3<vczle><vczbe>): ... This.
        (ior<mode>3): Rename to...
        (ior<mode>3<vczle><vczbe>): ... This.
        (xor<mode>3): Rename to...
        (xor<mode>3<vczle><vczbe>): ... This.
        * config/aarch64/iterators.md (VDZ): Define.

gcc/testsuite/ChangeLog:

        PR target/99195
        * gcc.target/aarch64/simd/pr99195_1.c: New test.

Attachment: vcz.patch
Description: vcz.patch

Reply via email to