Re: [PATCH] vect: Support early break with gswitch statements

2025-09-18 Thread Pengfei Li
> Does this fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866 ? Ah yes, thanks for reminding. I will wait for review comments first, and add the PR reference in my next version then. Thanks, Pengfei

[PATCH] vect: Support early break with gswitch statements

2025-09-18 Thread Pengfei Li
This patch adds vectorization support to early-break loops with gswitch statements. Such gswitches may come from original switch-case constructs in the source or the iftoswitch pass which rewrites if conditions with a chain of comparisons, like below, to gswitch statements. if (a[i] == c1

[PATCH v2] vect: Improve vectorization for small-trip-count loops using subvectors

2025-08-20 Thread Pengfei Li
Hi all, This is a rebase of my earlier patch: https://gcc.gnu.org/pipermail/gcc-patches/2025-May/683011.html Changes in v2: - Only resolves the conflicts after the hybrid SLP removal - No functionality or design change We had a lot of discussions around this before, mainly on whether we could av

[PATCH][13 BACKPORT] AArch64: Fix invalid immediate offsets in SVE gather/scatter [PR121449]

2025-08-15 Thread Pengfei Li
Hi, This patch backports the fix of r16-3083 to gcc-13. Compared to the trunk version, this is slightly different because those RTL patterns in gcc-13 do not yet use the compact syntax for multiple alternatives. But this patch is functionally identical to the trunk fix. Bootstrapped and regteste

Re: [PATCH v2] AArch64: Fix invalid immediate offsets in SVE gather/scatter [PR121449]

2025-08-15 Thread Pengfei Li
Hi Reviewers, > I’d also ask for a slightly more descriptive sentence like “Use vg > constraint for alternative so-and-so”. > Ok to push whatever reword you come up with. This has been committed to trunk as r16-3083 for about one week. I wonder if I could consider backporting it now. The change

[PATCH v2] AArch64: Fix invalid immediate offsets in SVE gather/scatter [PR121449]

2025-08-08 Thread Pengfei Li
Below v2 patch just updates the commit message in v1. Please let me know if it's good enough now. Thanks, Pengfei -- >8 -- This patch fixes incorrect constraints in RTL patterns for AArch64 SVE gather/scatter with type widening/narrowing and vector-plus-immediate addressing. The bug leads to bel

[PATCH] AArch64: Fix invalid immediate offsets in SVE gather/scatter [PR121449]

2025-08-07 Thread Pengfei Li
This patch fixes incorrect constraints in RTL patterns for AArch64 SVE gather/scatter with type widening/narrowing and vector-plus-immediate addressing. The bug leads to below "immediate offset out of range" errors during assembly, eventually causing compilation failures. /tmp/ccsVqBp1.s: Assemble

[COMMITTED] MAINTAINERS: Add myself to write after approval

2025-08-07 Thread Pengfei Li
Leoshkevichiii Kriang Lerdsuwanakijlerdsuwa Pan Li - +Pengfei Li pfustc Renlin Li renlin Xinliang David Li davidxl Kewen Lin

Re: [PATCH] vect: Extend peeling and versioning for alignment to VLA modes

2025-08-06 Thread Pengfei Li
Hi Richard, > Can you split this fix out? I guess the bug is latent on branches > as well? > Otherwise the patch looks good to me - thanks for the very > nice work. Thanks for your review. The small fix is in a code path that had been unreachable before, since peeling with masking for VLA mode

[PATCH] vect: Extend peeling and versioning for alignment to VLA modes

2025-08-01 Thread Pengfei Li
This patch extends the support for peeling and versioning for alignment from VLS modes to VLA modes. The key change is allowing the DR target alignment to be set to a non-constant poly_int. Since the value must be a power-of-two, for variable VFs, the power-of-two check is deferred to runtime throu

Re: [PATCH v2] vect: Fix insufficient alignment requirement for speculative loads [PR121190]

2025-07-29 Thread Pengfei Li
Hi, The comment in v2 is addressed. Tested again on both trunk and gcc-15. Ok for trunk and gcc-15? Changes in v3: - Extract the constant VF check out. Changes in v2: - Remove the condition of dr_safe_speculative_read_required. - Add a constant VF check. Thanks, Pengfei -- >8 -- This patch

[PATCH v2] vect: Fix insufficient alignment requirement for speculative loads [PR121190]

2025-07-29 Thread Pengfei Li
Hi, I have updated the fix and the test case as you suggested. Patch is re-tested on trunk and gcc-15. Ok for both trunk and gcc-15? Thanks, Pengfei -- >8 -- This patch fixes a segmentation fault issue that can occur in vectorized loops with an early break. When GCC vectorizes such loops, it ma

[PATCH v2] vect: Add missing skip-vector check for peeling with versioning [PR121020]

2025-07-29 Thread Pengfei Li
Hi, I have adjusted the test case as you suggested. Ok for trunk? Thanks, Pengfei -- >8 -- This fixes a miscompilation issue introduced by the enablement of combined loop peeling and versioning. A test case that reproduces the issue is included in the patch. When performing loop peeling, GCC u

Re: [PATCH] vect: Fix insufficient alignment requirement for speculative loads [PR121190]

2025-07-29 Thread Pengfei Li
Hi Richi, > But why do we need to reject VLA vectorization for versioning when > the target only requires element alignment? I could for example > think that arm could have gone with requiring NEON vector size > alignment for SVE accesses. > > I do agree that keeping the old check is "safe", but

Re: [PATCH] vect: Fix insufficient alignment requirement for speculative loads [PR121190]

2025-07-29 Thread Pengfei Li
Hi Richi, > > > > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc > > > > > index 75e06ff28e6..8595c76eae2 100644 > > > > > --- a/gcc/tree-vect-data-refs.cc > > > > > +++ b/gcc/tree-vect-data-refs.cc > > > > > @@ -2972,7 +2972,8 @@ vect_enhance_data_refs_alignment (loop_vec_

Re: [PATCH] vect: Fix insufficient alignment requirement for speculative loads [PR121190]

2025-07-29 Thread Pengfei Li
Hi, Thanks Tamar and Richi for the review. > > I wonder about what the intention of this code was. It seems to me that it > > was > > trying to disable versioning for VLA, but then also doubling up and using > > the > > mode as the alignment. But the cross iteration alignment check below this

Re: [PATCH] vect: Fix insufficient alignment requirement for speculative loads [PR121190]

2025-07-29 Thread Pengfei Li
PING Ok for trunk and gcc-15? > [PATCH] vect: Fix insufficient alignment requirement for speculative loads > [PR121190]

[PING] [PATCH] vect: Add missing skip-vector check for peeling with versioning [PR121020]

2025-07-29 Thread Pengfei Li
PING > [PATCH] vect: Add missing skip-vector check for peeling with versioning > [PR121020]

[PATCH] vect: Add missing skip-vector check for peeling with versioning [PR121020]

2025-07-21 Thread Pengfei Li
This fixes a miscompilation issue introduced by the enablement of combined loop peeling and versioning. A test case that reproduces the issue is included in the patch. When performing loop peeling, GCC usually inserts a skip-vector check. This ensures that after peeling, there are enough remaining

[PATCH] vect: Fix insufficient alignment requirement for speculative loads [PR121190]

2025-07-21 Thread Pengfei Li
This patch fixes a segmentation fault issue that can occur in vectorized loops with an early break. When GCC vectorizes such loops, it may insert a versioning check to ensure that data references (DRs) with speculative loads are aligned. The check normally requires DRs to be aligned to the vector m

[PATCH v2] vect: Use combined peeling and versioning for mutually aligned DRs

2025-06-11 Thread Pengfei Li
Current GCC uses either peeling or versioning, but not in combination, to handle unaligned data references (DRs) during vectorization. This limitation causes some loops with early break to fall back to scalar code at runtime. Consider the following loop with DRs in its early break condition:

Re: [PATCH] vect: Use combined peeling and versioning for mutually aligned DRs

2025-06-10 Thread Pengfei Li
Hi Alex, > It might be nice to at least experiment with supporting DRs with > different steps as follow-on work. I agree that we should leave it out > for the first version to keep things simple. > FWIW, in case it's of interest, I wrote a script to calculate the > possible combinations of align

[PATCH] vect: Use combined peeling and versioning for mutually aligned DRs

2025-06-06 Thread Pengfei Li
Current GCC uses either peeling or versioning, but not in combination, to handle unaligned data references (DRs) during vectorization. This limitation causes some loops with early break to fall back to scalar code at runtime. Consider the following loop with DRs in its early break condition:

Re: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-06-04 Thread Pengfei Li
Thank you for all suggestions above. > > I see. So this clearly is a feature on instructions then, not modes. > > In fact it might be profitable to use unpredicated add to avoid > > computing the loop mask for a specific element width completely even > > when that would require more operation for

[PING] [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-06-02 Thread Pengfei Li
k. Thanks again for your time and all your inputs. -- Thanks, Pengfei From: Tamar Christina Sent: 09 May 2025 15:03 To: Richard Biener Cc: Richard Sandiford; Pengfei Li; gcc-patches@gcc.gnu.org; ktkac...@nvidia.com Subject: RE: [PATCH] vect: Improve vect

Re: [PING^2][PATCH v3] match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, y) for vectors

2025-06-02 Thread Pengfei Li
PING^2 From: Pengfei Li Sent: 22 May 2025 9:51 To: gcc-patches@gcc.gnu.org Cc: rguent...@suse.de; jeffreya...@gmail.com; pins...@gmail.com Subject: [PING][PATCH v3] match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, y) for vectors Hi, Just a

[PING][PATCH v3] match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, y) for vectors

2025-05-22 Thread Pengfei Li
Hi, Just a gentle ping for below patch v3. I’ve made minor changes from v2 to v3, as listed below: - Added check if IFN_AVG_FLOOR is supported. - Wrapped new code in match.pd with macro "#ifdef GIMPLE". > This patch folds vector expressions of the form (x + y) >> 1 into > IFN_AVG_FLOOR (x, y), r

[PATCH v3] match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, y) for vectors

2025-05-12 Thread Pengfei Li
This patch folds vector expressions of the form (x + y) >> 1 into IFN_AVG_FLOOR (x, y), reducing instruction count on platforms that support averaging operations. For example, it can help improve the codegen on AArch64 from: add v0.4s, v0.4s, v31.4s ushrv0.4s, v0.4s, 1 to:

Re: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-09 Thread Pengfei Li
Hi Richard Biener, As Richard Sandiford has already addressed your questions in another email, I just wanted to add a few below. > That said, we already have unmasked ABS in the IL: > > vect__1.6_15 = .MASK_LOAD (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, ... }, { 0, ...

[PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-08 Thread Pengfei Li
This patch improves the auto-vectorization for loops with known small trip counts by enabling the use of subvectors - bit fields of original wider vectors. A subvector must have the same vector element type as the original vector and enough bits for all vector elements to be processed in the loop.

[PATCH v2] match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, y) for vectors

2025-05-08 Thread Pengfei Li
This patch folds vector expressions of the form (x + y) >> 1 into IFN_AVG_FLOOR (x, y), reducing instruction count on platforms that support averaging operations. For example, it can help improve the codegen on AArch64 from: add v0.4s, v0.4s, v31.4s ushrv0.4s, v0.4s, 1 to:

Re: [PATCH] (not just) AArch64: Fold unsigned ADD + LSR by 1 to UHADD

2025-05-02 Thread Pengfei Li
> Heh. This is a bit of a hobby-horse of mine. IMO we should be trying > to make the generic, target-independent vector operations as useful > as possible, so that people only need to resort to target-specific > intrinsics if they're doing something genuinely target-specific. > At the moment, we

Re: [PATCH] (not just) AArch64: Fold unsigned ADD + LSR by 1 to UHADD

2025-05-01 Thread Pengfei Li
Thank you for the comments. > I don't think we can use an unbounded recursive walk, since that > would become quadratic if we ever used it when optimising one > AND in a chain of ANDs. (And using this function for ANDs > seems plausible.) Maybe we should be handling the information > in a simila

[PATCH] AArch64: Fold unsigned ADD + LSR by 1 to UHADD

2025-04-28 Thread Pengfei Li
This patch implements the folding of a vector addition followed by a logical shift right by 1 (add + lsr #1) on AArch64 into an unsigned halving add, allowing GCC to emit NEON or SVE2 UHADD instructions. For example, this patch helps improve the codegen from: add v0.4s, v0.4s, v31.4s

Re: [PATCH] simplify-rtx: Combine bitwise operations in more cases

2025-04-28 Thread Pengfei Li
Thanks Richard for all review comments. I have addressed the comments and sent a v2 patch in a new email thread. -- Thanks, Pengfei

[PATCH v2] simplify-rtx: Combine bitwise operations in more cases

2025-04-28 Thread Pengfei Li
This patch transforms RTL expressions of the form (subreg (not X)) into (not (subreg X)) if the subreg is an operand of another binary logical operation. This transformation can expose opportunities to combine more logical operations. For example, it improves the codegen of the following AArch64 N

[PATCH] simplify-rtx: Combine bitwise operations in more cases

2025-04-23 Thread Pengfei Li
This patch transforms RTL expressions of the form (subreg (not X) off) into (not (subreg X off)) when the subreg is an operand of a bitwise AND or OR. This transformation can expose opportunities to combine a NOT operation with the bitwise AND/OR. For example, it improves the codegen of the follow