RE: [PATCH 4/7]AArch64 Add pattern xtn+xtn2 to uzp2

2021-10-13 Thread Kyrylo Tkachov via Gcc-patches
> -Original Message- > From: Tamar Christina > Sent: Wednesday, October 13, 2021 12:06 PM > To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org > Cc: nd ; Richard Earnshaw ; > Marcus Shawcroft ; Richard Sandiford > > Subject: RE: [PATCH 4/7]AArch64 Add pattern xtn+xtn2

RE: [PATCH 4/7]AArch64 Add pattern xtn+xtn2 to uzp2

2021-10-13 Thread Tamar Christina via Gcc-patches
> > Hmmm these patterns are identical in what they match they just have the > effect of printing operands 1 and 2 in a different order. > Perhaps it's more compact to change the output template into a > BYTES_BIG_ENDIAN ? > "uzp1\\t%0., %1., %2."" : > uzp1\\t%0., %2., %1." > and avoid having a sec

RE: [PATCH 4/7]AArch64 Add pattern xtn+xtn2 to uzp2

2021-10-12 Thread Kyrylo Tkachov via Gcc-patches
> -Original Message- > From: Tamar Christina > Sent: Tuesday, October 12, 2021 5:25 PM > To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org > Cc: nd ; Richard Earnshaw ; > Marcus Shawcroft ; Richard Sandiford > > Subject: RE: [PATCH 4/7]AArch64 Add pattern xtn+

RE: [PATCH 4/7]AArch64 Add pattern xtn+xtn2 to uzp2

2021-10-12 Thread Tamar Christina via Gcc-patches
Hi All, This is a new version with BE support and more tests. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-simd.md (*aarch64_narrow_trunc_le): (*aarch64_narrow_trunc_be): New. * co

RE: [PATCH 4/7]AArch64 Add pattern xtn+xtn2 to uzp2

2021-09-30 Thread Kyrylo Tkachov via Gcc-patches
> -Original Message- > From: Tamar Christina > Sent: Wednesday, September 29, 2021 5:20 PM > To: gcc-patches@gcc.gnu.org > Cc: nd ; Richard Earnshaw ; > Marcus Shawcroft ; Kyrylo Tkachov > ; Richard Sandiford > > Subject: [PATCH 4/7]AArch64 Add pattern x

[PATCH 4/7]AArch64 Add pattern xtn+xtn2 to uzp2

2021-09-29 Thread Tamar Christina via Gcc-patches
Hi All, This turns truncate operations with a hi/lo pair into a single permute of half the bit size of the input and just ignoring the top bits (which are truncated out). i.e. void d2 (short * restrict a, int *b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; } now generates: .L4: