My bad. Thanks for fixing this quickly, Andrew!
Thanks,
Pengxuan
>
> After r15-4579-g9ffcf1f193b477, we get the following warning/error while
> bootstrapping on aarch64:
> ```
> ../../gcc/gcc/config/aarch64/aarch64.cc: In function ‘rtx_def*
> aarch64_ptrue_reg(machine_mode, unsigned int)’:
> ../.
> Pengxuan Zheng writes:
> > This is similar to the recent improvements to the Advanced SIMD
> > popcount expansion by using SVE. We can utilize SVE to generate more
> > efficient code for scalar mode popcount too.
> >
> > Changes since v1:
> > * v2: Add a new VNx1BI mode and a new test case for V
> Pengxuan Zheng writes:
> > This is similar to the recent improvements to the Advanced SIMD
> > popcount expansion by using SVE. We can utilize SVE to generate more
> > efficient code for scalar mode popcount too.
> >
> > Changes since v1:
> > * v2: Add a new VNx1BI mode and a new test case for V
> Pengxuan Zheng writes:
> > This is similar to the recent improvements to the Advanced SIMD
> > popcount expansion by using SVE. We can utilize SVE to generate more
> > efficient code for scalar mode popcount too.
> >
> > PR target/113860
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aa
> > Pengxuan Zheng writes:
> > > We can still use SVE's INDEX instruction to construct vectors even
> > > if not all elements are constants. For example, { 0, x, 2, 3 } can
> > > be constructed by first using "INDEX #0, #1" to generate { 0, 1, 2,
> > > 3 }, and then set the elements which are non-
> > > On 16 Sep 2024, at 16:32, Richard Sandiford
> wrote:
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > "Pengxuan Zheng (QUIC)" writes:
> > >>> On Thu, Sep 12, 2024 at 2:
> > On 16 Sep 2024, at 16:32, Richard Sandiford
> wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > "Pengxuan Zheng (QUIC)" writes:
> >>> On Thu, Sep 12, 2024 at 2:53 AM Pengxuan Zheng
> >>&
> Pengxuan Zheng writes:
> > We can still use SVE's INDEX instruction to construct vectors even if
> > not all elements are constants. For example, { 0, x, 2, 3 } can be
> > constructed by first using "INDEX #0, #1" to generate { 0, 1, 2, 3 },
> > and then set the elements which are non-constants
> "Pengxuan Zheng (QUIC)" writes:
> >> On Thu, Sep 12, 2024 at 2:53 AM Pengxuan Zheng
> >> wrote:
> >> >
> >> > SVE's INDEX instruction can be used to populate vectors by values
> >> > starting from "base" and incr
> > Pengxuan Zheng writes:
> > > SVE's INDEX instruction can be used to populate vectors by values
> > > starting from "base" and incremented by "step" for each subsequent
> > > value. We can take advantage of it to generate vector constants if
> > > TARGET_SVE is available and the base and step v
> On Thu, Sep 12, 2024 at 2:53 AM Pengxuan Zheng
> wrote:
> >
> > SVE's INDEX instruction can be used to populate vectors by values
> > starting from "base" and incremented by "step" for each subsequent
> > value. We can take advantage of it to generate vector constants if
> > TARGET_SVE is availa
> Pengxuan Zheng writes:
> > SVE's INDEX instruction can be used to populate vectors by values
> > starting from "base" and incremented by "step" for each subsequent
> > value. We can take advantage of it to generate vector constants if
> > TARGET_SVE is available and the base and step values are
Pushed as r15-2659-ge4b8db26de352.
Pengxuan
> This patch improves the Advanced SIMD popcount expansion by using SVE if
> available.
>
> For example, GCC currently generates the following code sequence for V2DI:
> cnt v31.16b, v31.16b
> uaddlp v31.8h, v31.16b
> uaddlp v31.4s, v31.8h
>
> Sorry for the slow review.
>
> Pengxuan Zheng writes:
> > This patch improves the Advanced SIMD popcount expansion by using SVE
> > if available.
> >
> > For example, GCC currently generates the following code sequence for V2DI:
> > cnt v31.16b, v31.16b
> > uaddlp v31.8h, v31.16b
> >
> Pengxuan Zheng writes:
> > This patch improves GCC’s vectorization of __builtin_popcount for
> > aarch64 target by adding popcount patterns for vector modes besides
> > QImode, i.e., HImode, SImode and DImode.
> >
> > With this patch, we now generate the following for V8HI:
> > cnt v1.16b,
> > On 6/28/24 6:18 AM, Pengxuan Zheng wrote:
> > > This patch improves GCC’s vectorization of __builtin_popcount for
> > > aarch64 target by adding popcount patterns for vector modes besides
> > > QImode, i.e., HImode, SImode and DImode.
> > >
> > > With this patch, we now generate the following f
> On 6/28/24 6:18 AM, Pengxuan Zheng wrote:
> > This patch improves GCC’s vectorization of __builtin_popcount for
> > aarch64 target by adding popcount patterns for vector modes besides
> > QImode, i.e., HImode, SImode and DImode.
> >
> > With this patch, we now generate the following for V8HI:
> >
Please ignore this patch. I accidently added unrelated changes. I'll push a
correct version shortly.
Sorry for the noise.
Thanks,
Pengxuan
> This patch improves GCC’s vectorization of __builtin_popcount for aarch64
> target by adding popcount patterns for vector modes besides QImode, i.e.,
> HIm
Thanks, Richard! I've updated the patch accordingly.
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655912.html
Please let me know if any other changes are needed.
Thanks,
Pengxuan
> Sorry for the slow reply.
>
> Pengxuan Zheng writes:
> > This patch improves GCC’s vectorization of __buil
> On Mon, Jun 17, 2024 at 11:25 PM Pengxuan Zheng
> wrote:
> >
> > This patch improves GCC’s vectorization of __builtin_popcount for
> > aarch64 target by adding popcount patterns for vector modes besides
> > QImode, i.e., HImode, SImode and DImode.
> >
> > With this patch, we now generate the fol
> Hi,
>
> > -Original Message-
> > From: Pengxuan Zheng
> > Sent: Friday, June 14, 2024 12:57 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Pengxuan Zheng
> > Subject: [PATCH v3] aarch64: Add vector popcount besides QImode
> > [PR113859]
> >
> > This patch improves GCC’s vectorization of __
> Pengxuan Zheng writes:
> > This patch adds the fix_truncv4sfv4hi2 (V4SF->V4HI) pattern which is
> > implemented using fix_truncv4sfv4si2 (V4SF->V4SI) and then truncv4siv4hi2
> (V4SI->V4HI).
> >
> > PR target/113882
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (fix_trun
> Pengxuan Zheng writes:
> > This patch improves GCC’s vectorization of __builtin_popcount for
> > aarch64 target by adding popcount patterns for vector modes besides
> > QImode, i.e., HImode, SImode and DImode.
> >
> > With this patch, we now generate the following for V8HI:
> > cnt v1.16b,
> Pengxuan Zheng writes:
> > This patch improves GCC’s vectorization of __builtin_popcount for
> > aarch64 target by adding popcount patterns for vector modes besides
> > QImode, i.e., HImode, SImode and DImode.
> >
> > With this patch, we now generate the following for HImode:
> > cnt v1.16
Ping https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650311.html
> -Original Message-
> From: Pengxuan Zheng (QUIC)
> Sent: Tuesday, April 30, 2024 5:32 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Andrew Pinski (QUIC) ; Pengxuan Zheng
> (QUIC)
> Subject: [PATCH
> Pengxuan Zheng writes:
> > This patch is a follow-up of r15-1079-g230d62a2cdd16c to add vector
> > floating point trunc pattern for V2DF->V2SF and V4SF->V4HF conversions
> > by renaming the existing
> > aarch64_float_truncate_lo_ pattern to the standard
> > optab one, i.e., trunc2. This allows t
> Pengxuan Zheng writes:
> > This patch adds vector floating point extend pattern for V2SF->V2DF
> > and
> > V4HF->V4SF conversions by renaming the existing
> > V4HF->aarch64_float_extend_lo_
> > pattern to the standard optab one, i.e., extend2. This
> > allows the vectorizer to vectorize certain
Ping
> -Original Message-
> From: Pengxuan Zheng (QUIC)
> Sent: Tuesday, April 30, 2024 5:32 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Andrew Pinski (QUIC) ; Pengxuan Zheng
> (QUIC)
> Subject: [PATCH] aarch64: Add vector popcount besides QImode [PR113859]
>
>
> > Pengxuan Zheng writes:
> > > vget_low_2.c is a test case for little-endian, but we missed the
> > > -mlittle-endian flag in r15-697-ga2e4fe5a53cf75.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/aarch64/vget_low_2.c: Add -mlittle-endian.
> >
> > Ok, thanks.
> >
> > If you'd l
> Pengxuan Zheng writes:
> > This patch improves vectorization of certain floating point widening
> > operations for the aarch64 target by adding vector floating point
> > extend patterns for
> > V2SF->V2DF and V4HF->V4SF conversions.
> >
> > PR target/113880
> > PR target/113869
> >
> > g
> Pengxuan Zheng writes:
> > vget_low_2.c is a test case for little-endian, but we missed the
> > -mlittle-endian flag in r15-697-ga2e4fe5a53cf75.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/vget_low_2.c: Add -mlittle-endian.
>
> Ok, thanks.
>
> If you'd like write access,
> On Mon, May 20, 2024 at 2:57 AM Richard Sandiford
> wrote:
> >
> > Pengxuan Zheng writes:
> > > This patch folds vget_low_* intrinsics to BIT_FILED_REF to open up
> > > more optimization opportunities for gimple optimizers.
> > >
> > > While we are here, we also remove the vget_low_* definition
32 matches
Mail list logo