On Mon, 22 Sep 2025, Tamar Christina wrote:
> > -Original Message-
> > From: Richard Biener
> > Sent: 22 September 2025 12:34
> > To: Pengfei Li
> > Cc: gcc-patches@gcc.gnu.org; rguent...@suse.de; Tamar Christina
> >
> > Subject: Re: [PAT
On Thu, Sep 18, 2025 at 2:34 PM Pengfei Li wrote:
>
> This patch adds vectorization support to early-break loops with gswitch
> statements. Such gswitches may come from original switch-case constructs
> in the source or the iftoswitch pass which rewrites if conditions with a
> chain of comparisons
On Fri, Sep 19, 2025 at 8:58 AM wrote:
>
> From: Pan Li
>
> This patch would like to try to match the the unsigned
> SAT_MUL form 5, aka below:
>
> #define DEF_SAT_U_MUL_FMT_5(NT, WT) \
> NT __attribute__((noinline))\
> sat_u_mul_##NT##_from_##WT##_fmt_5 (NT
On Thu, Sep 18, 2025 at 10:31 AM Avinash Jayakar wrote:
>
> Hi,
>
> Following is version 2 of the patch proposed for master aiming to fix
> PR104116. This has been bootstrapped and regtested on powerpc64le with
> regression failures.
> Kindly review.
>
> Just had one question.
> If I have to imple
When PRE asks VN to simplify a NARY but not insert, that bypasses
the abnormal guard in maybe_push_res_to_seq and we blindly accept
new uses of abnormals. The following fixes this.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
PR tree-optimization/122016
* tree-ssa
The following disables the use of rotate patterns with reductions
since it breaks then single rotate SSA use-def chain constraints.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
PR tree-optimization/122023
* tree-vect-patterns.cc (vect_recog_rotate_pattern): Disable
On Mon, Sep 22, 2025 at 4:22 AM liuhongt wrote:
>
> Since it regressed SPEC performance(Refer to PR121994), I guess
> it's related to register pressure and can be tuned by adjusting
> reduc_lat_mult_thr. I don't have Zen2 machine, so for simplity, I'll
> just disable unroll in vectorizer for Zen2.
On Sun, Sep 21, 2025 at 5:20 PM Mark Wielaard wrote:
>
> Hi Aarsen,
>
> Added Jonathan to CC to get his opinion on the libstdc++ part of the
> documentation (re)generation.
>
> On Mon, Sep 08, 2025 at 06:07:48PM +0200, Arsen Arsenović wrote:
> > Mark Wielaard writes:
> >
> > > I think it is a goo
On Sat, Sep 20, 2025 at 5:15 AM Andrew Pinski
wrote:
>
> This is the first patch in removing fold_all_builtins pass.
> We want to fold __builtin_constant_p into 0 if we know the argument can't be
> a constant. So currently that is done in fab pass (though ranger handles it
> now too).
> Instead o
On Sat, Sep 20, 2025 at 4:24 AM Andrew Pinski wrote:
>
>
>
> On Fri, Sep 19, 2025, 7:08 PM Peter0x44 wrote:
>>
>> On 2025-09-20 02:33, Andrew Pinski wrote:
>> > On Fri, Sep 19, 2025, 6:22 PM Peter Damianov
>> > wrote:
>> >
>> >> This patch implements folding of aggregate assignments (*dest =
>>
On Wed, Sep 17, 2025 at 9:22 AM Robin Dapp wrote:
>
> > We are supposed to not get into
> >
> > if (mask_element != index)
> > noop_p = false;
>
> I guess the problem is the vectype mismatch. We're checking the permutation
> for e.g. V16QI = {0, 1, 2, 3, 8, 9, 10, 11, ...} which, in
g)
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -12461,10 +12488,22 @@ vectorizable_early_exit (loop_vec_info loop_vinfo,
> stmt_vec_info stmt_info,
>
>while (workset.length () > 1)
> {
> - new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> tree arg0 = workset.pop ();
> tree arg1 = workset.pop ();
> - new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> + if (addhn_supported_p && workset.length () == 0)
> + {
> + new_stmt = gimple_build_call_internal (ifn, 2, arg0, arg1);
> + vectype_out = narrow_type;
> + new_temp = make_temp_ssa_name (vectype_out, NULL, "vexit_reduc");
> + gimple_call_set_lhs (as_a (new_stmt), new_temp);
> + gimple_call_set_nothrow (as_a (new_stmt), true);
> + }
> + else
> + {
> + new_temp = make_temp_ssa_name (vectype_out, NULL, "vexit_reduc");
> + new_stmt
> + = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> + }
> vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> &cond_gsi);
> workset.quick_insert (0, new_temp);
> @@ -12487,6 +12526,7 @@ vectorizable_early_exit (loop_vec_info loop_vinfo,
> stmt_vec_info stmt_info,
>
>gcc_assert (new_temp);
>
> + tree cst = build_zero_cst (vectype_out);
>gimple_cond_set_condition (cond_stmt, NE_EXPR, new_temp, cst);
>update_stmt (orig_stmt);
>
>
>
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
The following re-implements the fix for PR84830 where the original
fix causes missed optimizations. The issue with PR84830 is that
we end up growing ANTIC_IN value set during iteration which happens
because we conditionally prune values based on ANTIC_OUT - TMP_GEN
expressions. But when ANTIC_OUT
On Wed, Sep 17, 2025 at 10:30 PM Robin Dapp wrote:
>
> When trying to unify the vector_vector_composition variants I noticed that
> there are even more alignment checks than when I last looked ;)
Yeah :/ I don't like the way it's done very much. Having a unified idea of
a "punning" type and do
On Wed, Sep 17, 2025 at 3:38 PM Robin Dapp wrote:
>
> > For a non-STMT_VINFO_STRIDED_P access the DR_GROUP_SIZE is
> > basically the DR_STRIDE, because the DR group models contiguous memory.
>
> You meant DR_STEP? So if step/stride = 100 and we access the first two
> elements at 0, 1, the third i
With no longer visiting TREE_CHAIN for decls we have to visit
the DECL_ARGUMENT chain manually.
LTO bootstrap and regtest running on x86_64-unknown-linux-gnu.
I'm not sure whether LTO bootstrap worked before, but hopefully this
would have fixed it. Testing non-LTO to be able to push it anyway as
The following removes a block I added (and disabled again) when
developing the PR121720 fix.
Bootstrapped on x86_64-unknown-linux-gnu, pushed.
* tree-ssa-pre.cc (compute_antic_aux): Remove dead code.
---
gcc/tree-ssa-pre.cc | 14 --
1 file changed, 14 deletions(-)
diff --git
On Tue, Sep 16, 2025 at 6:34 AM Andrew Pinski
wrote:
>
> After copy propagation for aggregates patches we might end up with
> now:
> ```
> tmp = a;
> b = a; // was b = tmp;
> tmp = {CLOBBER};
> ```
> To help out ESRA, it would be a good idea to remove the `tmp = a` statement as
> there is no DSE b
On Fri, 19 Sep 2025, Mikael Morin wrote:
> Le 18/09/2025 à 09:43, Richard Biener a écrit :
> > diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
> > index 99331730bc2..18b36259cb4 100644
> > --- a/gcc/tree-ssa-pre.cc
> > +++ b/gcc/tree-ssa-pre.cc
> > @@ -211
This was only used for non-SLP.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
* tree-vectorizer.h (_stmt_vec_info::store_count): Remove.
(DR_GROUP_STORE_COUNT): Likewise.
* tree-vect-stmts.cc (vect_transform_stmt): Remove non-SLP
path.
---
gcc/tree-ve
The following removes the dual non-SLP/SLP API in favor of only
handling SLP. This also removes the possibility to override
vectype of a SLP node with an inconsistent one while still using
the SLP nodes number of lanes. This requires adjustment of
a few places where such inconsistencies happened.
The following removes the redundant SLP_TREE_NUMBER_OF_VEC_STMTS,
replacing it with vect_get_num_copies. Previously it was already
made sure that all setters adhere to that.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
* tree-vectorizer.h (_slp_tree::vec_stmts_size): Remo
On Thu, 18 Sep 2025, Jakub Jelinek wrote:
> On Thu, Sep 18, 2025 at 06:54:06PM +0200, Jason Merrill wrote:
> > > There are some regressions caused by the removal of {CLOBBER(bob)}
> > > clobbers from the start of certain constructors, e.g. one testcase has
> > > struct A
> > > {
> > >int f,g;
On Fri, Sep 19, 2025 at 12:08 AM Peter Damianov wrote:
>
> POSIX says that sin and cos should set errno to EDOM when infinity is passed
> to
> them. Make sure this is accounted for in builtins.def.
>
> When sin/cos are called with values that set errno (like INFINITY), GCC was
> incorrectly optim
On Thu, Sep 18, 2025 at 10:19 PM Robin Dapp wrote:
>
> > But the vector type we perform the permutation on should be unchanged (it's
> > not the punned type but the original type we pun the loaded vector back to)?
>
> Yeah, I was trying to re-use what we have but I see now that just passing a
> di
On Tue, Sep 9, 2025 at 3:31 AM wrote:
>
> From: Pan Li
>
> The widening-mul will insert a cast for the widen-mul, the
> function build_and_insert_cast is design to take care of it.
>
> In some case the optimized gimple has some unnecessary cast,
> for example as below code.
>
> #define SAT_U_MU
On Thu, 11 Sep 2025, Tamar Christina wrote:
> > -Original Message-
> > From: Richard Biener
> > Sent: Thursday, September 11, 2025 12:56 PM
> > To: Tamar Christina
> > Cc: Robin Dapp ; GCC Patches > patc...@gcc.gnu.org>; rdsandif...@googlemail.com
On Thu, Sep 18, 2025 at 1:21 PM Robin Dapp wrote:
>
> Hi,
>
> This patch adds an explicit variant of vect_transform_slp_perm_load that
> just does the analysis part of vect_transform_slp_perm_load.
>
> I find it slightly clearer to indicate "analysis" in the
> function name already rather than hav
The following restricts the number of locations we register a predicate
as valid which avoids the expensive linear search for cases like
if (a)
A;
if (a)
B;
if (a)
C;
...
where we register a != 0 as true for locations A, B, C ... in an
unlimited way. The patch simply c
& SCALAR_INT_MODE_P (mode)
> +&& INTEGRAL_TYPE_P (TREE_TYPE (treeop0))
> && (GET_MODE_SIZE (as_a (mode))
> > GET_MODE_SIZE (as_a (GET_MODE (op0
> && get_range_pos_neg (treeop0,
>
> Jakub
>
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Wed, Sep 17, 2025 at 2:40 AM Andrew Pinski
wrote:
>
> On Tue, Sep 16, 2025 at 5:54 AM Richard Biener
> wrote:
> >
> > On Tue, Sep 16, 2025 at 6:34 AM Andrew Pinski
> > wrote:
> > >
> > > After copy propagation for aggregates patches we might e
On Mon, Sep 8, 2025 at 2:22 PM Iain Sandoe wrote:
>
>
>
> > On 8 Sep 2025, at 12:46, Richard Biener wrote:
> >
> > On Mon, Sep 8, 2025 at 1:04 PM Ville Voutilainen
> > wrote:
> >>
> >> On Mon, 8 Sept 2025 at 13:54, Richard Biener
On Fri, 12 Sep 2025, Tamar Christina wrote:
> > -Original Message-
> > From: Richard Biener
> > Sent: Friday, September 12, 2025 1:40 PM
> > To: Tamar Christina
> > Cc: Robin Dapp ; GCC Patches > patc...@gcc.gnu.org>; rdsandif...@googlemail.com
On Mon, Sep 15, 2025 at 8:53 PM Robin Dapp wrote:
>
> Hi,
>
> This patch adds gather/scatter handling for grouped access. The idea is
> to e.g. replace an access (for uint8_t elements) like
> arr[0]
> arr[1]
> arr[2]
> arr[3]
> arr[0 + step]
> arr[1 + step]
> ...
> by a gather load
On Mon, Sep 8, 2025 at 10:15 AM liuhongt wrote:
>
> SLP may take a broadcast as kind of vec_perm, the patch checks the
> permutation index to exclude those false positive.
>
> > > > so the vectorizer costs sth withy count == 0? I'll see to fix that,
> > > > but this also
> > > > means the code sh
On Mon, Sep 8, 2025 at 3:54 PM Iain Sandoe wrote:
>
>
>
> > On 8 Sep 2025, at 14:40, Richard Biener wrote:
> >
> > On Mon, Sep 8, 2025 at 3:16 PM Jakub Jelinek wrote:
> >>
> >> On Mon, Sep 08, 2025 at 03:05:58PM +0200, Richard Biener wrote:
On Wed, Sep 17, 2025 at 1:15 PM Robin Dapp wrote:
>
> > On Wed, Sep 17, 2025 at 9:22 AM Robin Dapp wrote:
> >>
> >> > We are supposed to not get into
> >> >
> >> > if (mask_element != index)
> >> > noop_p = false;
> >>
> >> I guess the problem is the vectype mismatch. We're checkin
When transitioning gcc.dg/torture/pr84830.c to a GIMPLE testcase to
feed the IL into PRE that caused the original issue (and verify it's
still there with the fix reverted), I noticed we put up SSA operands
before having fully parsed the function and thus with not all
variables having the final TREE
> >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Richard Biener
> > > > > Sent: Tuesday, September 16, 2025 3:03 PM
> > > > > To: Liu, Hongtao
> > > > > Cc: gcc-patches@
On Wed, Sep 17, 2025 at 4:55 AM Andrew Pinski
wrote:
>
> Since now optimize_aggr_zeroprop and optimize_agr_copyprop work by forward
> walk to prop
> the zero/aggregate and does not change the statement at hand, there is no
> reason to
> repeat the loop if they do anything. This will prevent pro
On Wed, Sep 17, 2025 at 3:18 AM Andrew Pinski
wrote:
>
> While running uninclude on PR99912's preprocessed source uninclude
> didn't uninclude some of the x86_64 target headers. This was because
> `lib/gcc//include` was not noticed as an possible system
> include dir. It supported `gcc-lib//includ
On Wed, Sep 17, 2025 at 12:33 AM Andrew Pinski
wrote:
>
> After r16-3887-g597b50abb0d2fc, the check to see if the copy is
> a nop copy becomes inefficient. The code going into an infinite
> loop as the copy keeps on being propagated over and over again.
>
> That is if we have:
> ```
> struct s1
On Wed, Sep 17, 2025 at 12:33 AM Andrew Pinski
wrote:
>
> If both operands that are being compared are decls, operand_equal_p will
> already
> handle that case so an early out can be done here.
>
> Bootstrapped and tested on x86_64-linux-gnu.
OK.
> gcc/ChangeLog:
>
> * tree-ssa-forwprop
On Tue, Sep 16, 2025 at 4:15 PM Robin Dapp wrote:
>
> > Well, what you want to catch now isn't single-lane anymore. But I guess
> > since
> > we now check the permute before this we can rely on check for n_perms == 0
> > to catch the "no actual permutation required" case?
>
> I'm seeing n_perms =
On Tue, Sep 16, 2025 at 10:30 AM Eric Botcazou wrote:
>
> > I mean TREE_READONLY on ..._REF nodes. We can't rely on the absence of
> > TREE_READONLY on ..._REF meaning the object is writable, so the flag does
> > not add any information (but maybe some costing hint that the object is
> > definite
On Tue, Sep 16, 2025 at 3:07 PM Robin Dapp wrote:
>
> > I think this now conflicts a bit with what I just pushed (sorry).
> >
> >>&& loop_vinfo)
> >> {
> >> + unsigned i, j;
> >> + bool simple_perm_series = true;
> >> + FOR_EACH_VEC_ELT (SLP_TREE_LOAD_PERMUTATION (slp_n
On Mon, 15 Sep 2025, Avinash Jayakar wrote:
> Hello Richard,
>
> Thank you for reviewing the patch! I have made changes based on your
> comments, but I have some doubts for a few comments as mentioned below.
>
> On Thu, 2025-09-11 at 13:08 +0200, Richard Biener wrote:
>
On Tue, Sep 16, 2025 at 5:22 AM wrote:
>
> From: Pan Li
>
> This patch would like to try to match the the unsigned
> SAT_MUL form 4, aka below:
>
> #define DEF_SAT_U_MUL_FMT_5(NT, WT) \
> NT __attribute__((noinline))\
> sat_u_mul_##NT##_from_##WT##_fmt_5 (NT
scan-tree-dump-times "epilogue loop vectorized using masked
> 64 byte vectors" 1 "vect" } } */
> /* { dg-final { scan-tree-dump-not "loop vectorized using 32 byte vectors"
> "vect" } } */
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Tue, Sep 16, 2025 at 9:53 AM Liu, Hongtao wrote:
>
>
>
> > -Original Message-
> > From: Richard Biener
> > Sent: Tuesday, September 16, 2025 3:03 PM
> > To: Liu, Hongtao
> > Cc: gcc-patches@gcc.gnu.org; hjl.to...@gmail.com
> > Subject: Re:
On Mon, Sep 15, 2025 at 7:20 PM Andrew Pinski
wrote:
>
> This moves the code used in optimize_agr_copyprop_1 (r16-3887-g597b50abb0d)
> to handle this same case into its new function and use it inside
> optimize_agr_copyprop_arg. This allows to remove more copies that show up only
> in arguments.
>
On Tue, Sep 16, 2025 at 7:53 AM liuhongt wrote:
>
> From: "hongtao.liu"
>
> Align move_max with prefer_vector_width for SPR/GNR/DMR to avoid STLF issue.
> It's similar as previous commit.
>
> commit 6ea25c041964bf63014fcf7bb68fb1f5a0a4e123
> Author: liuhongt
> Date: Thu Aug 15 12:54:07 2024 +0
On Mon, Sep 15, 2025 at 9:29 PM Eric Botcazou wrote:
>
> > Yes. So I read the comment in a way to say that TREE_THIS_NOTRAP does not
> > mean the reference is writable. In some context we check
> >
> > || tree_could_trap_p (lhs)
> >
> > /* tree_could_trap_p is a predicate for
On Mon, Sep 15, 2025 at 8:51 PM Robin Dapp wrote:
>
> Hi,
>
> This patch removes the type argument from the vector_misalignment hook.
> Ever since we switched from element to byte misalignment its
> semantics haven't been particularly clear and nowadays it should be
> redundant.
>
> Also, in case
On Thu, Sep 11, 2025 at 12:03 PM Robin Dapp wrote:
>
> Hi,
>
> This patch adjusts vect_gather_scatter_fn_p to always check an offset
> type with swapped signedness (vs. the original offset argument).
> If the target supports the gather/scatter with the new offset type the
> offset is converted to
On Tue, Sep 9, 2025 at 3:31 AM wrote:
>
> From: Pan Li
>
> The widen-mul removed the unnecessary cast, thus adjust the
> SAT_MUL of wide-mul to a simpler form.
OK.
> gcc/ChangeLog:
>
> * match.pd: Remove unnecessary cast of unsigned
> SAT_MUL for widen-mul.
>
> Signed-off-by: Pa
On Mon, Sep 15, 2025 at 12:05 PM Eric Botcazou wrote:
>
> > Yes please. Can we assert the MEM_REF offset is zero and the MEM_REF
> > isn't type-punning, aka TREE_TYPE of the MEM_REF is compatible with
> > the decls type, or is this not easily possible (again because of the
> > placeholders)?
>
>
On Mon, Sep 15, 2025 at 1:44 PM Eric Botcazou wrote:
>
> > Ah, I wasn't aware of that. This makes TREE_THIS_NOTRAP possibly not
> > usable for tree_could_trap_p :/ One could read the docs so that it means
> > when you have a read with TREE_THIS_NOTRAP then you can't infer
> > from that that writ
On Thu, Sep 11, 2025 at 3:16 PM Matteo Nicoli
wrote:
>
> I am writing this follow-up email to specify that I executed the tests
> contained in this patch on aarch64-arm64-linux-gnu
The changelog part of the commit message is formatted wrongly.
* gcc/match.pd: added the following optimizations
The following unifies the vect_transform_slp_perm_load call done
in vectorizable_load with that eventually done in get_load_store_type.
On the way it fixes the conditions on which we can allow
VMAT_ELEMENTWISE or VMAT_GATHER_SCATTER when there's a SLP permutation
(and we arrange to not code generat
On Tue, Sep 9, 2025 at 6:17 AM Andrew Pinski
wrote:
>
> It turns out easy to add support for memcpy copy prop when the memcpy
> has changed into `MEM` copy.
> Instead of rejecting right out we need to figure out that
> `a` and `MEM[&a]` are equivalent in terms of address and size.
> And then creat
On Mon, Sep 15, 2025 at 11:52 AM Robin Dapp wrote:
>
> > The rest of the GCN hook is quite inconsistent, it says misalignment
> > == -1 is OK, is_packed
> > is not and then the above ... and gather-scatter is also always OK
> > (even if the scalar accesses
> > are 'packed' aka not naturally aligne
On Mon, Sep 15, 2025 at 9:11 AM Robin Dapp wrote:
>
> >> In that case I relied on !is_packed for riscv.
> >
> > I guess it's easiest to keep is_packed then, but is it having the data
> > accesses aligned to element _size_ or to element mode alignment?
> > For gather a target couldn't distinguish t
On Mon, Sep 15, 2025 at 11:23 AM Eric Botcazou wrote:
>
> > Do we need to ensure that, for the MEM_REF case at least, the DECL is of
> > appropriate size with respect to the TREE_TYPE of the MEM_REF and
> > the offset (TREE_OPERAND (*tp, 1))? That is, consider
> >
> > ptr = &too_small_object;
>
On Mon, Sep 15, 2025 at 8:03 AM Eric Botcazou wrote:
>
> Hi,
>
> For parameters passed by reference, the Ada compiler sets TREE_THIS_NOTRAP on
> their dereference to prevent tree_could_trap_p from returning true and then
> causing a new basic block to be created for every access to them (given tha
On Sun, Sep 14, 2025 at 8:17 PM Andrew Pinski
wrote:
>
> This pattern shows up with some C++ code (std::vector) where we get:
> ```
> _9 = _201 - _36;
> _10 = (long unsigned int) _9;
> _11 = -_10;
> _12 = _201 + _11;
> ```
>
> In the original code it was `end - (end - begin)` but with inli
On Sun, 14 Sep 2025, Sam James wrote:
> Richard Biener writes:
>
> > With no longer visiting TREE_CHAIN for decls we have to visit
> > the DECL_ARGUMENT chain manually.
> >
> > LTO bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >
> >
On Fri, 12 Sep 2025, Nathaniel Shead wrote:
> On Fri, Sep 12, 2025 at 09:13:18AM +0200, Richard Biener wrote:
> > On Fri, 12 Sep 2025, Nathaniel Shead wrote:
> >
> > > On Thu, Sep 11, 2025 at 11:08:54AM +0200, Richard Biener wrote:
> > > > On Th
> Am 12.09.2025 um 19:03 schrieb Jeff Law :
>
> Shreya's work to add the addptr pattern on the RISC-V port exposed a latent
> bug in LRA.
>
> We lazily allocate/reallocate the ira_reg_equiv structure and when we do
> (re)allocation we'll over-allocate and zero-fill so that we don't have to
The following tries to do vect_transform_slp_perm_load exactly
once during analysis and once during transform. There's a 2nd
case left during analysis in get_load_store_type. Temporarily
this records n_perms in the load-store info and verifies that
against the value computed at transform stage.
On Fri, Sep 12, 2025 at 12:09 PM Robin Dapp wrote:
>
> > I wonder in which cases we have misalignment == -1 but !is_packed?
>
> Isn't that just the new case of gather_scatter (without punning and when we
> couldn't analyze the dataref)? The dataref might be naturally aligned but we
> explicitly s
This adds permute_info_type and removes the duplication from
vect_schedule_slp_node.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
* tree-vectorizer.h (stmt_vec_info_type::permute_info_type): Add.
(vectorizable_slp_permutation): Declare.
* tree-vect-slp.cc (ve
The following makes us always use VMAT_STRIDED_SLP for negative
stride multi-element accesses. That handles falling back to
single element accesses transparently.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
* tree-vect-stmts.cc (get_load_store_type): Use VMAT_STRIDED_SLP
"target.\n");
> + addhn_supported_p = false;
> + }
> +}
> +
>/* Analyze only. */
>if (cost_vec)
> {
> - if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> + if (!addhn_supported_p
>
On Fri, 12 Sep 2025, Nathaniel Shead wrote:
> On Thu, Sep 11, 2025 at 11:08:54AM +0200, Richard Biener wrote:
> > On Thu, 11 Sep 2025, Richard Biener wrote:
> >
> > > On Wed, 10 Sep 2025, Nathaniel Shead wrote:
> > >
> > > > Does this fix seem
On Thu, Sep 11, 2025 at 6:06 PM Robin Dapp wrote:
>
> > Hmm, so the existing "punning" code for VMAT_STRIDED_SLP does
> >
> > tree vtype
> > = vector_vector_composition_type (vectype, const_nunits / n,
> > &ptype);
> >
On Fri, 5 Sep 2025, Richard Biener wrote:
> The PR reports
>
> vectorizer.h:276:3: runtime error: load of value 32695, which is not a valid
> value for type 'internal_fn'
>
> which I believe is from
>
> slp_node->data = new vect_load_store_data (st
effective-target lto }
> +// { dg-additional-options "-fmodules -flto" }
> +
> +export module M;
> +export template struct S;
> +export template void foo(S) {}
> +template struct S {
> + friend void foo<>(S);
> +};
> diff --git a/gcc/testsuite/g++.dg/
does only calls the IFN when at least one lane is
> active.
>
> I do not believe I need a LEN version here either? But If If I'm wrong It
> would
> be useful to have a small example.
I think you need a len variant unless the mask producer had len
applied with an else value of 0 (IIRC RVV always preferse 'undefined'
as else value). OTOH the "first" element - if one is set and we never
require 'else' - should work with or without loop masking (with len or
mask).
That said, I do wonder why we have both extract_last and
fold_extract_last.
Richard.
>
> Thanks,
> Tamar
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
uld simply say: New testcase.
I have pushed the change with this adjustments as r16-3802-gaa4aafbad5235f
Richard.
> Best regards,
> Matteo
>
>
>
> On 5 Sep 2025, at 9:27 am, Richard Biener wrote:
>
> On Thu, Sep 4, 2025 at 4:15 PM Matteo Nicoli
> wrote:
>
>
&g
ULL);
> + def_stmt = gimple_build_assign(extr_cond, COND_EXPR, cond_reg4,
> +build_int_cst(itype, 1), build_int_cst(itype, 0));
> + append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt);
> +
> + // q -= (x ^ y < 0 && r) ? 1 : 0
> + tree floor_mod_r = vect_recog_temp_ssa_var(itype, NULL);
> + pattern_stmt = gimple_build_assign(floor_mod_r, MINUS_EXPR, q,
> extr_cond);
> +}
You are emitting code that might not be vectorizable and which needs
post-processing with bool vector patterns. So you should
1) use the appropriate precision scalar bools, anticipating the vector
mask type used
2) check at least whether the compares are supported, I think we can
rely on bit operations suppoort
Richard.
> + }
> }
>
>/* Pattern detected. */
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Mon, Sep 8, 2025 at 7:58 PM Edwin Lu wrote:
>
> This patch tries to add support for a variant of SAT_TRUNC where
> negative numbers are clipped to 0 instead of NARROW_TYPE_MAX_VALUE.
> This form is seen in x264, aka
>
> UT clip (T a)
> {
> return a & (UT)(-1) ? (-a) >> 31 : a;
> }
>
> Where s
On Mon, Sep 8, 2025 at 3:19 PM Robin Dapp wrote:
>
> Hi,
>
> This patch adds gather/scatter handling for grouped access. The idea is
> to e.g. replace an access (for uint8_t elements) like
> arr[0]
> arr[1]
> arr[2]
> arr[3]
> arr[0 + step]
> arr[1 + step]
> ...
> by a gather load o
When a dead EH or abnormal edge makes a call queued for noreturn fixup
unreachable, just skip processing it.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
PR tree-optimization/121870
* tree-ssa-propagate.cc
(substitute_and_fold_engine::substitute_and_fold):
On Thu, 11 Sep 2025, Richard Biener wrote:
> On Wed, 10 Sep 2025, Nathaniel Shead wrote:
>
> > Does this fix seem reasonable, or is there something I've missed?
> >
> > My change to g++.dg/lto/pr101396_0.C also causes it to fail link with
> > some flags on
re or backport to 12 without a testcase (assuming a suitable one
> can't be crafted).
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
> PR tree-optimization/121772
> * match.pd: Add type check to reduc(ctor) pattern.
>
> gcc/testsuite/ChangeLog:
>
>
When the vectorizer removes a forwarder created earlier by split_edge
it uses redirect_edge_pred for convenience and efficiency. That breaks
down when the edge split is originating from an asm goto as that is
a jump that needs adjustments from redirect_edge_and_branch. The
following factores a si
> Am 10.09.2025 um 09:27 schrieb Jakub Jelinek :
>
> Hi!
>
> I thought this wouldn't be necessary because RAW_DATA_CST can only appear
> inside of (array) CONSTRUCTORs within DECL_INITIAL of TREE_STATIC vars,
> so there shouldn't be a need to expand it. Except that we have an
> optimization
> Am 10.09.2025 um 09:50 schrieb Jakub Jelinek :
>
> Hi!
>
> THe lowering of .{ADD,SUB,MUL}_OVERFLOW ifns is optimized, so that we don't
> in the common cases uselessly don't create a large _Complex _BitInt
> temporary with the first (real) part being the result and second (imag) part
> just
> Am 10.09.2025 um 10:01 schrieb Jakub Jelinek :
>
> Hi!
>
> This is something that has bothered me for a few years but I've only found
> time for it now.
> The glob used for finding *_1.* etc. counterparts to the *_0.* tests is too
> broad, so if one has say next to *_1.c file also *_1.c~ or
When there's an asm goto in the latch of a loop we may not use
IP_END IVs since instantiating those would (need to) split the
latch edge which in turn invalidates IP_NORMAL position handling.
This is a revision of the PR107997 fix.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> Am 09.09.2025 um 12:54 schrieb Juergen Christ :
>
> The length returned by vect_get_loop_len is REALLEN + BIAS, but was
> assumed to be REALLEN - BIAS. If BIAS is -1, this leads to wrong
> code.
>
> Bootstrapped and regtested on s390. Ok for trunk?
Ok. Can you also test on ppc64le which
On Mon, Sep 8, 2025 at 11:44 AM Iain Sandoe wrote:
>
>
>
> > On 8 Sep 2025, at 08:57, Richard Biener wrote:
> >
> > On Sun, Sep 7, 2025 at 9:43 PM Iain Sandoe wrote:
> >>
> >> Thanks for the helpful input from reviewers;
> >>
> &g
On Sun, Sep 7, 2025 at 9:43 PM Iain Sandoe wrote:
>
> Thanks for the helpful input from reviewers;
>
> This version has 4 changes from v1:
> 1. removes some unrelated changes.
> 2. As per Jakub's observations, we now special-case
>std::observable_checkpoint so that it is guaranteed to be lower
> Am 08.09.2025 um 17:53 schrieb Iain Sandoe :
>
>
>
>>> On 8 Sep 2025, at 15:53, Richard Biener wrote:
>>>
>>>
>>>
>>>> Am 08.09.2025 um 16:28 schrieb Iain Sandoe :
>>>
>>>
>>>
>>>>
> Am 08.09.2025 um 16:28 schrieb Iain Sandoe :
>
>
>
>>> On 8 Sep 2025, at 15:20, Iain Sandoe wrote:
>>>
>>>
>>>
On 8 Sep 2025, at 15:05, Jakub Jelinek wrote:
>>>
>>> On Mon, Sep 08, 2025 at 02:54:18PM +0100, Iain Sandoe wrote:
(for pre-conditions) they lower to a series of
On Mon, Sep 8, 2025 at 3:16 PM Jakub Jelinek wrote:
>
> On Mon, Sep 08, 2025 at 03:05:58PM +0200, Richard Biener wrote:
> > is reduced to __builtin_abort () (for C++). That's because it's
> > __builtin_unreachable () at the end. I am not aware of any
> > oth
On Mon, 8 Sep 2025, Jakub Jelinek wrote:
> On Mon, Sep 08, 2025 at 02:38:48PM +0200, Richard Biener wrote:
> > When there's an asm goto in the latch of a loop we may not use
> > IP_END IVs since instantiating those would (need to) split the
> > latch edge which in
The SLP reduc-index computation is confused by having an outer reduction
inner loop nested cycle fed by another non-reduction nested cycle.
Instead of undoing the unfortunate mixing of outer reduction inner
cycles with general nested cycles the following instead distinguishes
them by not setting ST
On Mon, Sep 8, 2025 at 1:04 PM Ville Voutilainen
wrote:
>
> On Mon, 8 Sept 2025 at 13:54, Richard Biener
> wrote:
> > That said, I see no point in std::observable_checkpoint to be represented
> > in the IL at all if all it is is to cater to FUD around what compilers might
&
1 - 100 of 3537 matches
Mail list logo