RE: [Pushed] aarch64: Fix warning in aarch64_ptrue_reg

2024-10-23 Thread Pengxuan Zheng (QUIC)
My bad. Thanks for fixing this quickly, Andrew! Thanks, Pengxuan > > After r15-4579-g9ffcf1f193b477, we get the following warning/error while > bootstrapping on aarch64: > ``` > ../../gcc/gcc/config/aarch64/aarch64.cc: In function ‘rtx_def* > aarch64_ptrue_reg(machine_mode, unsigned int)’: > ../.

RE: [PATCH v3] aarch64: Improve scalar mode popcount expansion by using SVE [PR113860]

2024-10-23 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng writes: > > This is similar to the recent improvements to the Advanced SIMD > > popcount expansion by using SVE. We can utilize SVE to generate more > > efficient code for scalar mode popcount too. > > > > Changes since v1: > > * v2: Add a new VNx1BI mode and a new test case for V

RE: [PATCH v2] aarch64: Improve scalar mode popcount expansion by using SVE [PR113860]

2024-10-14 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng writes: > > This is similar to the recent improvements to the Advanced SIMD > > popcount expansion by using SVE. We can utilize SVE to generate more > > efficient code for scalar mode popcount too. > > > > Changes since v1: > > * v2: Add a new VNx1BI mode and a new test case for V

RE: [PATCH] aarch64: Improve scalar mode popcount expansion by using SVE [PR113860]

2024-09-26 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng writes: > > This is similar to the recent improvements to the Advanced SIMD > > popcount expansion by using SVE. We can utilize SVE to generate more > > efficient code for scalar mode popcount too. > > > > PR target/113860 > > > > gcc/ChangeLog: > > > > * config/aarch64/aa

RE: [PATCH v2 2/2] aarch64: Improve part-variable vector initialization with SVE INDEX instruction [PR113328]

2024-09-18 Thread Pengxuan Zheng (QUIC)
> > Pengxuan Zheng writes: > > > We can still use SVE's INDEX instruction to construct vectors even > > > if not all elements are constants. For example, { 0, x, 2, 3 } can > > > be constructed by first using "INDEX #0, #1" to generate { 0, 1, 2, > > > 3 }, and then set the elements which are non-

RE: [PATCH 1/2] aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]

2024-09-17 Thread Pengxuan Zheng (QUIC)
> > > On 16 Sep 2024, at 16:32, Richard Sandiford > wrote: > > > > > > External email: Use caution opening links or attachments > > > > > > > > > "Pengxuan Zheng (QUIC)" writes: > > >>> On Thu, Sep 12, 2024 at 2:

RE: [PATCH 1/2] aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]

2024-09-17 Thread Pengxuan Zheng (QUIC)
> > On 16 Sep 2024, at 16:32, Richard Sandiford > wrote: > > > > External email: Use caution opening links or attachments > > > > > > "Pengxuan Zheng (QUIC)" writes: > >>> On Thu, Sep 12, 2024 at 2:53 AM Pengxuan Zheng > >>&

RE: [PATCH v2 2/2] aarch64: Improve part-variable vector initialization with SVE INDEX instruction [PR113328]

2024-09-17 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng writes: > > We can still use SVE's INDEX instruction to construct vectors even if > > not all elements are constants. For example, { 0, x, 2, 3 } can be > > constructed by first using "INDEX #0, #1" to generate { 0, 1, 2, 3 }, > > and then set the elements which are non-constants

RE: [PATCH 1/2] aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]

2024-09-16 Thread Pengxuan Zheng (QUIC)
> "Pengxuan Zheng (QUIC)" writes: > >> On Thu, Sep 12, 2024 at 2:53 AM Pengxuan Zheng > >> wrote: > >> > > >> > SVE's INDEX instruction can be used to populate vectors by values > >> > starting from "base" and incr

RE: [PATCH] aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]

2024-09-12 Thread Pengxuan Zheng (QUIC)
> > Pengxuan Zheng writes: > > > SVE's INDEX instruction can be used to populate vectors by values > > > starting from "base" and incremented by "step" for each subsequent > > > value. We can take advantage of it to generate vector constants if > > > TARGET_SVE is available and the base and step v

RE: [PATCH 1/2] aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]

2024-09-12 Thread Pengxuan Zheng (QUIC)
> On Thu, Sep 12, 2024 at 2:53 AM Pengxuan Zheng > wrote: > > > > SVE's INDEX instruction can be used to populate vectors by values > > starting from "base" and incremented by "step" for each subsequent > > value. We can take advantage of it to generate vector constants if > > TARGET_SVE is availa

RE: [PATCH] aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]

2024-09-11 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng writes: > > SVE's INDEX instruction can be used to populate vectors by values > > starting from "base" and incremented by "step" for each subsequent > > value. We can take advantage of it to generate vector constants if > > TARGET_SVE is available and the base and step values are

RE: [PATCH v2] aarch64: Improve Advanced SIMD popcount expansion by using SVE [PR113860]

2024-08-01 Thread Pengxuan Zheng (QUIC)
Pushed as r15-2659-ge4b8db26de352. Pengxuan > This patch improves the Advanced SIMD popcount expansion by using SVE if > available. > > For example, GCC currently generates the following code sequence for V2DI: > cnt v31.16b, v31.16b > uaddlp v31.8h, v31.16b > uaddlp v31.4s, v31.8h >

RE: [PATCH] aarch64: Improve Advanced SIMD popcount expansion by using SVE [PR113860]

2024-07-31 Thread Pengxuan Zheng (QUIC)
> Sorry for the slow review. > > Pengxuan Zheng writes: > > This patch improves the Advanced SIMD popcount expansion by using SVE > > if available. > > > > For example, GCC currently generates the following code sequence for V2DI: > > cnt v31.16b, v31.16b > > uaddlp v31.8h, v31.16b > >

RE: [PATCH v9] aarch64: Add vector popcount besides QImode [PR113859]

2024-07-02 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng writes: > > This patch improves GCC’s vectorization of __builtin_popcount for > > aarch64 target by adding popcount patterns for vector modes besides > > QImode, i.e., HImode, SImode and DImode. > > > > With this patch, we now generate the following for V8HI: > > cnt v1.16b,

RE: [PATCH v6] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-28 Thread Pengxuan Zheng (QUIC)
> > On 6/28/24 6:18 AM, Pengxuan Zheng wrote: > > > This patch improves GCC’s vectorization of __builtin_popcount for > > > aarch64 target by adding popcount patterns for vector modes besides > > > QImode, i.e., HImode, SImode and DImode. > > > > > > With this patch, we now generate the following f

RE: [PATCH v6] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-28 Thread Pengxuan Zheng (QUIC)
> On 6/28/24 6:18 AM, Pengxuan Zheng wrote: > > This patch improves GCC’s vectorization of __builtin_popcount for > > aarch64 target by adding popcount patterns for vector modes besides > > QImode, i.e., HImode, SImode and DImode. > > > > With this patch, we now generate the following for V8HI: > >

RE: [PATCH v7] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-28 Thread Pengxuan Zheng (QUIC)
Please ignore this patch. I accidently added unrelated changes. I'll push a correct version shortly. Sorry for the noise. Thanks, Pengxuan > This patch improves GCC’s vectorization of __builtin_popcount for aarch64 > target by adding popcount patterns for vector modes besides QImode, i.e., > HIm

RE: [PATCH v5] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-27 Thread Pengxuan Zheng (QUIC)
Thanks, Richard! I've updated the patch accordingly. https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655912.html Please let me know if any other changes are needed. Thanks, Pengxuan > Sorry for the slow reply. > > Pengxuan Zheng writes: > > This patch improves GCC’s vectorization of __buil

RE: [PATCH v4] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-18 Thread Pengxuan Zheng (QUIC)
> On Mon, Jun 17, 2024 at 11:25 PM Pengxuan Zheng > wrote: > > > > This patch improves GCC’s vectorization of __builtin_popcount for > > aarch64 target by adding popcount patterns for vector modes besides > > QImode, i.e., HImode, SImode and DImode. > > > > With this patch, we now generate the fol

RE: [PATCH v3] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-17 Thread Pengxuan Zheng (QUIC)
> Hi, > > > -Original Message- > > From: Pengxuan Zheng > > Sent: Friday, June 14, 2024 12:57 AM > > To: gcc-patches@gcc.gnu.org > > Cc: Pengxuan Zheng > > Subject: [PATCH v3] aarch64: Add vector popcount besides QImode > > [PR113859] > > > > This patch improves GCC’s vectorization of __

RE: [PATCH] aarch64: Add fix_truncv4sfv4hi2 pattern [PR113882]

2024-06-17 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng writes: > > This patch adds the fix_truncv4sfv4hi2 (V4SF->V4HI) pattern which is > > implemented using fix_truncv4sfv4si2 (V4SF->V4SI) and then truncv4siv4hi2 > (V4SI->V4HI). > > > > PR target/113882 > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-simd.md (fix_trun

RE: [PATCH v2] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-13 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng writes: > > This patch improves GCC’s vectorization of __builtin_popcount for > > aarch64 target by adding popcount patterns for vector modes besides > > QImode, i.e., HImode, SImode and DImode. > > > > With this patch, we now generate the following for V8HI: > > cnt v1.16b,

RE: [PATCH] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-12 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng writes: > > This patch improves GCC’s vectorization of __builtin_popcount for > > aarch64 target by adding popcount patterns for vector modes besides > > QImode, i.e., HImode, SImode and DImode. > > > > With this patch, we now generate the following for HImode: > > cnt v1.16

Ping [PATCH] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-11 Thread Pengxuan Zheng (QUIC)
Ping https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650311.html > -Original Message- > From: Pengxuan Zheng (QUIC) > Sent: Tuesday, April 30, 2024 5:32 PM > To: gcc-patches@gcc.gnu.org > Cc: Andrew Pinski (QUIC) ; Pengxuan Zheng > (QUIC) > Subject: [PATCH

RE: [PATCH] aarch64: Add vector floating point trunc pattern

2024-06-11 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng writes: > > This patch is a follow-up of r15-1079-g230d62a2cdd16c to add vector > > floating point trunc pattern for V2DF->V2SF and V4SF->V4HF conversions > > by renaming the existing > > aarch64_float_truncate_lo_ pattern to the standard > > optab one, i.e., trunc2. This allows t

RE: [PATCH v2] aarch64: Add vector floating point extend pattern [PR113880, PR113869]

2024-06-06 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng writes: > > This patch adds vector floating point extend pattern for V2SF->V2DF > > and > > V4HF->V4SF conversions by renaming the existing > > V4HF->aarch64_float_extend_lo_ > > pattern to the standard optab one, i.e., extend2. This > > allows the vectorizer to vectorize certain

Ping [PATCH] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-02 Thread Pengxuan Zheng (QUIC)
Ping > -Original Message- > From: Pengxuan Zheng (QUIC) > Sent: Tuesday, April 30, 2024 5:32 PM > To: gcc-patches@gcc.gnu.org > Cc: Andrew Pinski (QUIC) ; Pengxuan Zheng > (QUIC) > Subject: [PATCH] aarch64: Add vector popcount besides QImode [PR113859] > >

RE: [PATCH] aarch64: testsuite: Explicitly add -mlittle-endian to vget_low_2.c

2024-05-31 Thread Pengxuan Zheng (QUIC)
> > Pengxuan Zheng writes: > > > vget_low_2.c is a test case for little-endian, but we missed the > > > -mlittle-endian flag in r15-697-ga2e4fe5a53cf75. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/aarch64/vget_low_2.c: Add -mlittle-endian. > > > > Ok, thanks. > > > > If you'd l

RE: [PATCH] aarch64: Add vector floating point extend patterns [PR113880, PR113869]

2024-05-30 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng writes: > > This patch improves vectorization of certain floating point widening > > operations for the aarch64 target by adding vector floating point > > extend patterns for > > V2SF->V2DF and V4HF->V4SF conversions. > > > > PR target/113880 > > PR target/113869 > > > > g

RE: [PATCH] aarch64: testsuite: Explicitly add -mlittle-endian to vget_low_2.c

2024-05-30 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng writes: > > vget_low_2.c is a test case for little-endian, but we missed the > > -mlittle-endian flag in r15-697-ga2e4fe5a53cf75. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/vget_low_2.c: Add -mlittle-endian. > > Ok, thanks. > > If you'd like write access,

RE: [PATCH] aarch64: Fold vget_low_* intrinsics to BIT_FIELD_REF [PR102171]

2024-05-20 Thread Pengxuan Zheng (QUIC)
> On Mon, May 20, 2024 at 2:57 AM Richard Sandiford > wrote: > > > > Pengxuan Zheng writes: > > > This patch folds vget_low_* intrinsics to BIT_FILED_REF to open up > > > more optimization opportunities for gimple optimizers. > > > > > > While we are here, we also remove the vget_low_* definition