Re: [PATCH] lto: Don't check obj.found for offload section

2024-09-03 Thread Richard Biener
On Tue, Sep 3, 2024 at 5:44 PM H.J. Lu wrote: > > On Fri, Aug 23, 2024 at 5:50 AM Richard Biener > wrote: > > > > On Fri, Aug 23, 2024 at 2:36 PM H.J. Lu wrote: > > > > > > obj.found is the number of LTO symbols. We should include the offload > > > section when it is used by linker even if ther

[PATCH v2] testsuite: introduce hostedlib effective target

2024-09-03 Thread Alexandre Oliva
On Nov 9, 2023, Mike Stump wrote: > On Nov 8, 2023, at 8:29 AM, Alexandre Oliva wrote: >> >> On Nov 5, 2023, Mike Stump wrote: >> >>> that, otherwise, I'll approve this version. >> >> FWIW, this version is not usable as is. Something went wrong > Updates and fixes to the original plan ar

Re: [PATCH 1/2] split-paths: Move check for # of statements in join earlier

2024-09-03 Thread Kyrylo Tkachov
Hi Andrew, > On 3 Sep 2024, at 20:11, Andrew Pinski wrote: > > External email: Use caution opening links or attachments > > > This moves the check for # of statements to copy in join to > be the first check. This check is the cheapest check so it > should be first. Plus add a print to the dump

[PATCH] Match: Fix ordered and nonequal

2024-09-03 Thread Hu, Lin1
Hi, all This patch is a fix patch. Need to add :c for bit_and, because bit_and is commutative. And is (ltgt @0 @1) is simpler than (bit_not (uneq @0 @1)). Bootstrapped/regtested on x86-64-pc-linux-gnu, OK for trunk? BRs, Lin gcc/ChangeLog: * match.pd: Fix match for (bit_and (ordered @

Re: [PATCH] expand: Add dump for costing of positive divides

2024-09-03 Thread Richard Biener
> Am 04.09.2024 um 04:00 schrieb Andrew Pinski : > > While trying to understand PR 115910 I found it was useful to print out > the two costs of doing a signed and unsigned division just like was added in > r15-3272-g3c89c41991d8e8 for popcount==1. > > Bootstrapped and tested on x86_64-linux-g

[PATCH] i386: Support partial vectorized FMA for V2BF/V4BF

2024-09-03 Thread Levy Hsu
Hi Bootstrapped and tested on x86-64-pc-linux-gnu. Ok for trunk? This patch introduces support for vectorized FMA operations for bf16 types in V2BF and V4BF modes on the i386 architecture. New mode iterators and define_expand entries for fma, fnma, fms, and fnms operations are added in mmx.md, e

[PATCH] i386: Support partial signbit/xorsign/copysign/abs/neg/and/xor/ior/andn for V2BF/V4BF

2024-09-03 Thread Levy Hsu
Hi This patch adds support for bf16 operations in V2BF and V4BF modes on i386, handling signbit, xorsign, copysign, abs, neg, and various logical operations. Bootstrapped and tested on x86-64-pc-linux-gnu. Ok for trunk? gcc/ChangeLog: * config/i386/i386.cc (ix86_build_const_vector): Ad

[committed] CRIS: Add new peephole2 "lra_szext_decomposed_indir_plus"

2024-09-03 Thread Hans-Peter Nilsson
I thought I had already committed this, but it looks like it was left dangling when the make_more_copies patch (now committed) was in limbo and I disabled late-combine for (coremark) performance reasons. FWIW that's still a reason at r15-3386-gaf1500dd8c00 (2.6% regression). Tested cris-elf with/

[PATCH] expand: Add dump for costing of positive divides

2024-09-03 Thread Andrew Pinski
While trying to understand PR 115910 I found it was useful to print out the two costs of doing a signed and unsigned division just like was added in r15-3272-g3c89c41991d8e8 for popcount==1. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * expr.cc (expand_expr_divmod): Add d

[PATCH] i386: Integrate BFmode for Enhanced Vectorization in ix86_preferred_simd_mode

2024-09-03 Thread Levy Hsu
Hi This change adds BFmode support to the ix86_preferred_simd_mode function enhancing SIMD vectorization for BF16 operations. The update ensures optimized usage of SIMD capabilities improving performance and aligning vector sizes with processor capabilities. Bootstrapped and tested on x86-64-pc-l

Re: [PATCH v4] RISC-V: Supports Profiles in '-march' option.

2024-09-03 Thread Palmer Dabbelt
On Tue, 03 Sep 2024 18:05:42 PDT (-0700), Kito Cheng wrote: I don't see there is conflict if we want to support both gnu2024 and RVI profiles? Ya, they'd just be two different things aimed at solving the same set of problems. I'm just tired of users coming and complaining that stuff is broke

Re: [pushed] c++: support C++11 attributes in C++98

2024-09-03 Thread Jason Merrill
On 9/3/24 7:00 PM, Andrew Pinski wrote: On Tue, Sep 3, 2024 at 3:01 PM Jason Merrill wrote: Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- I don't see any reason why we can't allow the [[]] attribute syntax in C++98 mode with a pedwarn just like many other C++11 features. In fact,

Re: [PATCH v4] RISC-V: Supports Profiles in '-march' option.

2024-09-03 Thread Andrew Waterman
As is normally the case when it comes to matters of RISC-V International, Palmer is taking the least-charitable interpretation and then adding a generous dollop of falsehoods. The RVA23U64 profile is set to be ratified soon, and that's our intended target for apps processors. On Tue, Sep 3, 2024

Re: [PATCH v4] RISC-V: Supports Profiles in '-march' option.

2024-09-03 Thread Kito Cheng
I don't see there is conflict if we want to support both gnu2024 and RVI profiles? also I am not sure what the usage scenarios for the gnu2024 and how we defined that? On Wed, Sep 4, 2024 at 6:49 AM Palmer Dabbelt wrote: > > On Tue, 20 Aug 2024 23:18:36 PDT (-0700), jia...@iscas.ac.cn wrote: > >

[PUSHED] aarch64: Fix testcase vec-init-22-speed.c [PR116589]

2024-09-03 Thread Andrew Pinski
For this testcase, the trunk produces: ``` f_s16: fmovs31, w0 fmovs0, w1 ``` While the testcase was expecting what was produced in GCC 14: ``` f_s16: sxthw0, w0 sxthw1, w1 fmovd31, x0 fmovd0, x1 ``` After r15-1575-gea8061f46a

[PATCH] object-size: Use simple_dce_from_worklist in object-size pass

2024-09-03 Thread Andrew Pinski
While trying to see if there was a way to improve object-size pass to use the ranger (for pointer plus), I noticed that it leaves around the statement containing __builtin_object_size if it was reduced to a constant. This fixes that by using simple_dce_from_worklist. Bootstrapped and tested on x86

Re: [pushed] c++: support C++11 attributes in C++98

2024-09-03 Thread Andrew Pinski
On Tue, Sep 3, 2024 at 3:01 PM Jason Merrill wrote: > > Tested x86_64-pc-linux-gnu, applying to trunk. > > -- 8< -- > > I don't see any reason why we can't allow the [[]] attribute syntax in C++98 > mode with a pedwarn just like many other C++11 features. In fact, we > already do support it in so

Re: [PATCH v4] RISC-V: Supports Profiles in '-march' option.

2024-09-03 Thread Palmer Dabbelt
On Tue, 20 Aug 2024 23:18:36 PDT (-0700), jia...@iscas.ac.cn wrote: 在 2024/8/21 3:23, Palmer Dabbelt 写道: On Mon, 19 Aug 2024 21:53:54 PDT (-0700), jia...@iscas.ac.cn wrote: Supports RISC-V profiles[1] in -march option. Default input set the profile before other formal extensions. V2: Fixes s

Re: Ping: [PATCH v2] Explicitly document that the "counted_by" attribute is only supported in C.

2024-09-03 Thread Qing Zhao
thanks. Updated per your suggestion and pushed: https://gcc.gnu.org/pipermail/gcc-cvs/2024-September/408749.html Qing > On Sep 3, 2024, at 10:09, Jakub Jelinek wrote: > > On Tue, Sep 03, 2024 at 01:59:45PM +, Qing Zhao wrote: >> Hi, Jakub, >> >> I’d like to ping this simple patch again.

Re: [PATCH 2/2] split-path: Improve ifcvt heurstic for split path [PR112402]

2024-09-03 Thread Jeff Law
On 9/3/24 12:11 PM, Andrew Pinski wrote: This simplifies the heurstic for split path to see if the join bb is a ifcvt candidate. For the predecessors bbs need either to be empty or only have one statement in them which could be a decent ifcvt candidate. The previous heurstics would miss that:

Re: [PATCH 2/2] split-path: Improve ifcvt heurstic for split path [PR112402]

2024-09-03 Thread Jeff Law
On 9/3/24 12:11 PM, Andrew Pinski wrote: This simplifies the heurstic for split path to see if the join bb is a ifcvt candidate. For the predecessors bbs need either to be empty or only have one statement in them which could be a decent ifcvt candidate. The previous heurstics would miss that:

Re: [PATCH 1/2] split-paths: Move check for # of statements in join earlier

2024-09-03 Thread Jeff Law
On 9/3/24 12:11 PM, Andrew Pinski wrote: This moves the check for # of statements to copy in join to be the first check. This check is the cheapest check so it should be first. Plus add a print to the dump file since there was none beforehand. gcc/ChangeLog: * gimple-ssa-split-paths.

[PATCH] c++: ICE with TTP [PR96097]

2024-09-03 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14? -- >8 -- We crash when dependent_type_p gets a TEMPLATE_TYPE_PARM outside a template. That happens here because in template typename X> void func() {} template struct Y {}; void g() { func(); } when performing overload

[pushed] c++: support C++11 attributes in C++98

2024-09-03 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- I don't see any reason why we can't allow the [[]] attribute syntax in C++98 mode with a pedwarn just like many other C++11 features. In fact, we already do support it in some places in the grammar, but not in places that check cp_nth_token

Re: [PING^3] [PATCH] PR116080: Fix test suite checks for musttail

2024-09-03 Thread Mike Stump
On Sep 2, 2024, at 4:23 PM, Andi Kleen wrote: > > Andi Kleen writes: > > PING^3 Ok. >> Andi Kleen writes: >> >> PING^2 for https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658602.html >> >> This fixes some musttail related test suite failures that cause noise on >> various targets. >>

[pushed 3/3] pretty-print: split up pretty_printer::format into subroutines

2024-09-03 Thread David Malcolm
The body of pretty_printer::format is almost 500 lines long, mostly comprising two distinct phases. This patch splits it up so that there are explicit subroutines for the two different phases, reducing the scope of various locals, and making it easier to e.g. put a breakpoint on phase 2. No funct

[pushed 1/3] pretty-print: naming cleanups

2024-09-03 Thread David Malcolm
This patch is a followup to r15-3311-ge31b6176996567 making some cleanups to pretty-printing to reflect those changes: - renaming "chunk_info" to "pp_formatted_chunks" - renaming "cur_chunk_array" to "m_cur_fomatted_chunks" - rewording/clarifying comments and taking the opportunity to add a "m_" pr

[pushed 2/3] pretty-print: add selftest of pp_format's stack

2024-09-03 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r15-3430-gd0891f3aa75d31. gcc/ChangeLog: * pretty-print-format-impl.h (pp_formatted_chunks::get_prev): New accessor. * pretty-print.cc (selftest::push_pp_format): New. (ASSERT_TEXT_TOK

[PATCH] aarch64: Improve scalar mode popcount expansion by using SVE [PR113860]

2024-09-03 Thread Pengxuan Zheng
This is similar to the recent improvements to the Advanced SIMD popcount expansion by using SVE. We can utilize SVE to generate more efficient code for scalar mode popcount too. PR target/113860 gcc/ChangeLog: * config/aarch64/aarch64-simd.md (popcount2): Update pattern to

[PATCH] c++: noexcept and pointer to member function type [PR113108]

2024-09-03 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14? -- >8 -- We ICE in nothrow_spec_p because it got a DEFERRED_NOEXCEPT. This DEFERRED_NOEXCEPT was created in implicitly_declare_fn when declaring Foo& operator=(Foo&&) = default; in the test. The problem is that in resolve_overloa

Re: [PING] [PATCH] rust: avoid clobbering LIBS

2024-09-03 Thread Marc
Richard Biener writes: > On Wed, Aug 28, 2024 at 11:10 AM Marc wrote: >> >> Hello, >> >> Gentle reminder for this simple autoconf patch :) > > OK. > > Note that completely wiping LIBS might remove requirements detected earlier, > like some systems require explicit -lc for example. I would inste

Re: [PATCH][testsuite]: remove -fwrapv from signbit-5.c

2024-09-03 Thread Richard Biener
> Am 03.09.2024 um 19:00 schrieb Tamar Christina : > > Hi All, > > The meaning of the testcase was changed by passing it -fwrapv. The reason for > the test failures on some platform was because the test was testing some > implementation defined behavior wrt INT_MIN in generic code. > > Inst

[PATCH 2/2] split-path: Improve ifcvt heurstic for split path [PR112402]

2024-09-03 Thread Andrew Pinski
This simplifies the heurstic for split path to see if the join bb is a ifcvt candidate. For the predecessors bbs need either to be empty or only have one statement in them which could be a decent ifcvt candidate. The previous heurstics would miss that: ``` if (a) goto B else goto C; B: goto C; C:

[PATCH 1/2] split-paths: Move check for # of statements in join earlier

2024-09-03 Thread Andrew Pinski
This moves the check for # of statements to copy in join to be the first check. This check is the cheapest check so it should be first. Plus add a print to the dump file since there was none beforehand. gcc/ChangeLog: * gimple-ssa-split-paths.cc (is_feasible_trace): Move check for

[PATCH] coros: mark .CO_YIELD as LEAF [PR106973]

2024-09-03 Thread Arsen Arsenović
Tested on x86_64-pc-linux-gnu. OK for trunk? -- >8 -- We rely on .CO_YIELD calls being followed by an assignment (optionally) and then a switch/if in the same basic block. This implies that a .CO_YIELD can never end a block. However, since a call to .CO_YIELD is still a call, if

[PATCH] split-path: Improve ifcvt heurstic for split path [PR112402]

2024-09-03 Thread Andrew Pinski
This simplifies the heurstic for split path to see if the join bb is a ifcvt candidate. For the predecessors bbs need either to be empty or only have one statement in them which could be a decent ifcvt candidate. The previous heurstics would miss that: ``` if (a) goto B else goto C; B: goto C; C:

[pushed] c++: add fixed test [PR109095]

2024-09-03 Thread Marek Polacek
Tested x86_64-pc-linux-gnu, applying to trunk. -- >8 -- Fixed by r13-6693. PR c++/109095 gcc/testsuite/ChangeLog: * g++.dg/cpp2a/nontype-class66.C: New test. --- gcc/testsuite/g++.dg/cpp2a/nontype-class66.C | 19 +++ 1 file changed, 19 insertions(+) create mode

[PATCH 4/4]AArch64: Define VECTOR_STORE_FLAG_VALUE.

2024-09-03 Thread Tamar Christina
Hi All, This defines VECTOR_STORE_FLAG_VALUE to CONST1_RTX for AArch64 so we simplify vector comparisons in AArch64. With this enabled res: moviv0.4s, 0 cmeqv0.4s, v0.4s, v0.4s ret is simplified to: res: mvniv0.4s, 0 ret NOTE: I don't really

[PATCH 3/4][rtl]: simplify boolean vector EQ and NE comparisons

2024-09-03 Thread Tamar Christina
Hi All, This adds vector constant simplification for EQ and NE. This is useful since the vectorizer generates a lot more vector compares now, in particular NE and EQ and so these help us optimize cases where the values were not known at GIMPLE but instead only at RTL. Bootstrapped Regtested on a

[PATCH 2/4]middle-end: lower COND_EXPR into gimple form in vect_recog_bool_pattern

2024-09-03 Thread Tamar Christina
Hi All, Currently the vectorizer cheats when lowering COND_EXPR during bool recog. In the cases where the conditonal is loop invariant or non-boolean it instead converts the operation back into GENERIC and hides much of the operation from the analysis part of the vectorizer. i.e. a ? b : c is

[PATCH 1/4]middle-end: have vect_recog_cond_store_pattern use pattern statement for cond if available

2024-09-03 Thread Tamar Christina
Hi All, When vectorizing a conditional operation we rely on the bool_recog pattern to hit and convert the bool of the operand to a valid mask. However we are currently not using the converted operand as this is in a pattern statement. This change updates it to look at the actual statement to be

Re: [PATCH] RISC-V: Optimize branches with shifted immediate operands

2024-09-03 Thread Jeff Law
On 9/2/24 7:52 AM, Jovan Vukic wrote: The patch adds a new instruction pattern to handle conditional branches with equality checks between shifted arithmetic operands. This pattern optimizes the use of shifted constants (with trailing zeros), making it more efficient. For the C code: void

[PATCH v2 5/5] openmp, fortran: Add support for iterators in OpenMP 'target update' constructs (Fortran)

2024-09-03 Thread Kwok Cheung Yeung
This patch adds parsing and translation of the 'to' and 'from' clauses for the 'target update' construct in Fortran.From cfb6b76da5bba038d854d510a4fd44ddf4fa8f1f Mon Sep 17 00:00:00 2001 From: Kwok Cheung Yeung Date: Mon, 2 Sep 2024 19:34:29 +0100 Subject: [PATCH 5/5] openmp, fortran: Add support

[PATCH v2 4/5] openmp, fortran: Add support for map iterators in OpenMP target construct (Fortran)

2024-09-03 Thread Kwok Cheung Yeung
This patch adds support for iterators in the map clause of OpenMP target constructs. The parsing and translation of iterators in the front-end works the same as for the affinity and depend clauses. The iterator gimplification needed to be modified slightly to handle Fortran. The difference i

[PATCH v2 3/5] openmp: Add support for iterators in 'target update' clauses (C/C++)

2024-09-03 Thread Kwok Cheung Yeung
This patch extends the previous patch to cover to/from clauses in 'target update'.From c3dfc4a792610530a4ab729c3f250917b828e469 Mon Sep 17 00:00:00 2001 From: Kwok Cheung Yeung Date: Mon, 2 Sep 2024 19:34:09 +0100 Subject: [PATCH 3/5] openmp: Add support for iterators in 'target update' clauses

[PATCH v2 2/5] openmp: Add support for iterators in map clauses (C/C++)

2024-09-03 Thread Kwok Cheung Yeung
This patch modifies the C and C++ parsers to accept an iterator as a map type modifier, encoded in the same way as the depend and affinity clauses. When finishing the clauses, clauses with iterators are treated separately from ones without to avoid clashes (e.g. iterating over x[i] will likely gen

[PATCH v2 1/5] openmp: Refactor handling of iterators

2024-09-03 Thread Kwok Cheung Yeung
This patch factors out the code to calculate the number of iterations required and to generate the iteration loop into separate functions from gimplify_omp_depend for reuse later. I have also replaced the 'TREE_CODE (*tp) == TREE_LIST && ...' checks used for detecting an iterator clause with a ma

[PATCH v2 0/5] openmp: Add support for iterators in OpenMP mapping clauses

2024-09-03 Thread Kwok Cheung Yeung
This is an improved version of the previous series that was posted at: https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652680.html Compared to the previous version, this version delays the gimplification of iterators until the very end of gimplify_adjust_omp_clauses (instead of doing it in

[PATCH][docs]: [committed] remove double mention of armv9-a.

2024-09-03 Thread Tamar Christina
Hi All, The list of available architecture for Arm is incorrectly listing armv9-a twice. This removes the duplicate armv9-a enumeration from the part of the list having M-profile targets. committed under the obvious rule. Thanks, Tamar gcc/ChangeLog: * doc/invoke.texi: Remove duplicate

[PATCH][testsuite]: remove -fwrapv from signbit-5.c

2024-09-03 Thread Tamar Christina
Hi All, The meaning of the testcase was changed by passing it -fwrapv. The reason for the test failures on some platform was because the test was testing some implementation defined behavior wrt INT_MIN in generic code. Instead of using -fwrapv this just removes the border case from the test so

Zen5 tuning part 4: update reassociation width

2024-09-03 Thread Jan Hubicka
Hi, Zen5 has 6 instead of 4 ALUs and the integer multiplication can now execute in 3 of them. FP units can do 2 additions and 2 multiplications with latency 2 and 3. This patch updates reassociation width accordingly. This has potential of increasing register pressure but unlike while benchmarki

Re: [PATCH] lto: Don't check obj.found for offload section

2024-09-03 Thread H.J. Lu
On Fri, Aug 23, 2024 at 5:50 AM Richard Biener wrote: > > On Fri, Aug 23, 2024 at 2:36 PM H.J. Lu wrote: > > > > obj.found is the number of LTO symbols. We should include the offload > > section when it is used by linker even if there are no LTO symbols. > > OK. > > > PR lto/116361 > >

[PATCH v8 2/2] aarch64: Add codegen support for AdvSIMD faminmax

2024-09-03 Thread saurabh.jha
The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and mandatory from Armv9.5-a. It introduces instructions for computing the floating point absolute maximum and minimum of the two vectors element-wise. This patch adds code generation support for famax and famin in terms of existing R

[PATCH v8 1/2] aarch64: Add AdvSIMD faminmax intrinsics

2024-09-03 Thread saurabh.jha
The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and mandatory from Armv9.5-a. It introduces instructions for computing the floating point absolute maximum and minimum of the two vectors element-wise. This patch introduces AdvSIMD faminmax intrinsics. The intrinsics of this extensio

[PATCH v8 0/2] aarch64: Add support for AdvSIMD faminmax.

2024-09-03 Thread saurabh.jha
From: Saurabh Jha This series is a revised version of: https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661860.html. The first patch of the series is updated to address these comments: https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661866.html All comments are addressed exactly as s

[committed] libstdc++: Fix error handling in fs::hard_link_count for Windows

2024-09-03 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. -- >8 -- The recent change to use auto_win_file_handle for std::filesystem::hard_link_count caused a regression. The std::error_code argument should be cleared if no error occurs, but this no longer happens. Add a call to ec.clear() in fs::hard_link_count to

[committed] libstdc++: Specialize std::disable_sized_sentinel_for for std::move_iterator [PR116549]

2024-09-03 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. -- >8 -- LWG 3736 added a partial specialization of this variable template for two std::move_iterator types. This is needed for the case where the types satisfy std::sentinel_for and are subtractable, but do not model the semantics requirements of std::sized_

[PATCH v1 9/9] aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT

2024-09-03 Thread Evgeny Karpov
Monday, September 2, 2024 5:36 PM Richard Sandiford wrote: >> In some cases, the alignment can be bigger than BIGGEST_ALIGNMENT. >> The patch handles these cases. >> >> gcc/ChangeLog: >> >>* config/aarch64/aarch64-coff.h (ASM_OUTPUT_ALIGNED_LOCAL): >>Change alignment. > > Can you

[PATCH] d, ada/spec: only sub nostd{inc, lib} rather than nostd{inc, lib}*

2024-09-03 Thread Arsen Arsenović
Tested on x86_64-pc-linux-gnu. OK for trunk? -- >8 -- This prevents the gcc driver erroneously accepting -nostdlib++ when it should not when Ada was enabled. Also, similarly, -nostdinc* (where * is nonempty) is unhandled by either the Ada or D compiler, so the spec should not subs

[PATCH v1 4/9] aarch64: Exclude symbols using GOT from code models

2024-09-03 Thread Evgeny Karpov
Monday, September 2, 2024 5:00 PM Richard Sandiford wrote: > I think we should instead patch the callers that are using > aarch64_symbol_binds_local_p for GOT decisions. The function itself > is checking for a more general property (and one that could be useful > in other contexts). The patch h

[PATCH] libcpp: Implement the strict reading of the #embed expansion rules

2024-09-03 Thread Jakub Jelinek
Hi! The following patch attempts to implement the current wording of the C23 #embed expansion rules on top of the https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661901.html patch (haven't yet adjusted the rest of the series, but I expect only minor tweaks). After parsing #embed it first che

Re: Ping: [PATCH v2] Explicitly document that the "counted_by" attribute is only supported in C.

2024-09-03 Thread Jakub Jelinek
On Tue, Sep 03, 2024 at 01:59:45PM +, Qing Zhao wrote: > Hi, Jakub, > > I’d like to ping this simple patch again. It’s based on your suggestion in > PR116016 > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116016#c28 > > Could you please take a look at the patch and let me know whether it

[committed] libstdc++: Simplify std::any to fix -Wdeprecated-declarations warning

2024-09-03 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. -- >8 -- We don't need to use std::aligned_storage in std::any. We just need a POD type of the right size. The void* union member already ensures the alignment will be correct. Avoiding std::aligned_storage means we don't need to suppress a -Wdeprecated-decla

Zen5 tuning part 3: scheduler tweaks

2024-09-03 Thread Jan Hubicka
Hi, this patch adds support for new fussion in znver5 documented in the optimization manual: The Zen5 microarchitecture adds support to fuse reg-reg MOV Instructions with certain ALU instructions. The following conditions need to be met for fusion to happen: - The MOV should be reg-r

Re: Ping: [PATCH v2] Explicitly document that the "counted_by" attribute is only supported in C.

2024-09-03 Thread Qing Zhao
Hi, Jakub, I’d like to ping this simple patch again. It’s based on your suggestion in PR116016 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116016#c28 Could you please take a look at the patch and let me know whether its okay for committing to trunk? thanks. Qing > On Aug 12, 2024, at 09:5

Ping * 4: [PATCH v2] Provide more contexts for -Warray-bounds warning messages

2024-09-03 Thread Qing Zhao
Hi, Richard, I’d like to ping this patch again. For the convenience, the original 2nd version of the patch is at: https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657150.html The diagnostic part has been reviewed by David. Could you please take a look at the middle end implementation and le

Re: [PATCH v1 1/2] Match: Add int type fits check for form 1 of .SAT_SUB imm operand

2024-09-03 Thread Jeff Law
On 9/1/24 11:52 PM, pan2...@intel.com wrote: From: Pan Li This patch would like to add strict check for imm operand of .SAT_SUB matching. We have no type checking for imm operand in previous, which may result in unexpected IL to be catched by .SAT_SUB pattern. We leverage the int_fits_type

Re: [PATCH v1] RISC-V: Allow IMM operand for unsigned scalar .SAT_ADD

2024-09-03 Thread Jeff Law
On 9/2/24 5:27 AM, pan2...@intel.com wrote: From: Pan Li This patch would like to allow the IMM operand of the unsigned scalar .SAT_ADD. Like the operand 0, the operand 1 of .SAT_ADD will be zero extended to Xmode before underlying code generation. The below test suites are passed for this

[PATCH] Dump whether a SLP node represents load/store-lanes

2024-09-03 Thread Richard Biener
This makes it easier to discover whether SLP load or store nodes participate in load/store-lanes accesses. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. * tree-vect-slp.cc (vect_print_slp_tree): Annotate load and store-lanes nodes. --- gcc/tree-vect-slp

[PATCH] Fix missed peeling for gaps with SLP load-lanes

2024-09-03 Thread Richard Biener
The following disables peeling for gap avoidance with using smaller vector accesses when using load-lanes. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. * tree-vect-stmts.cc (get_group_load_store_type): Only disable peeling for gaps by using smaller vect

Re: Zen5 tuning part 2: disable gather and scatter

2024-09-03 Thread Richard Biener
On Tue, Sep 3, 2024 at 3:07 PM Jan Hubicka wrote: > > Hi, > We disable gathers for zen4. It seems that gather has improved a bit compared > to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when > the indices are known ahead of time. Vector loads followed by shuffles result

Zen5 tuning part 2: disable gather and scatter

2024-09-03 Thread Jan Hubicka
Hi, We disable gathers for zen4. It seems that gather has improved a bit compared to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when the indices are known ahead of time. Vector loads followed by shuffles result in a higher load bandwidth." however the situation seems to

Re: [PATCH] testsuite: Sanitize pacbti test cases for Cortex-M

2024-09-03 Thread Christophe Lyon
Hi Torbjörn, On 9/3/24 11:30, Torbjörn SVENSSON wrote: Ok for trunk and releases/gcc-14? -- Some of the test cases were scanning for "bti", but it would, incorrectly, match the ".arch_extenssion pacbti". Also, keep test cases active if a supported Cortex-M core is supplied. gcc/testsuite/Ch

[PATCH v1] Match: Support form 2 for scalar signed integer .SAT_ADD

2024-09-03 Thread pan2 . li
From: Pan Li This patch would like to support the form 2 of the scalar signed integer .SAT_ADD. Aka below example: Form 2: #define DEF_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_add_##T##_fmt_2 (T x, T y) \ {

Zen5 tuning part 1: avoid FMA chains

2024-09-03 Thread Jan Hubicka
Hi, testing matrix multiplication benchmarks shows that FMA on a critical chain is a perofrmance loss over separate multiply and add. While the latency of 4 is lower than multiply + add (3+2) the problem is that all values needs to be ready before computation starts. While on znver4 AVX512 code fa

[PATCH][v2] RISC-V: Also lower SLP grouped loads with just one consumer

2024-09-03 Thread Richard Biener
This makes sure to produce interleaving schemes or load-lanes for single-element interleaving and other permutes that otherwise would use more than three vectors. It exposes the latent issue that single-element interleaving with large gaps can be inefficient - the mitigation in get_group_load_stor

Re: PING: [PATCH] ipa: Don't disable function parameter analysis for fat LTO streaming

2024-09-03 Thread Jan Hubicka
> > > > > > PR ipa/116410 > > > * ipa-modref.cc (analyze_parms): Always analyze function parameter > > > for LTO streaming. > > > > > > Signed-off-by: H.J. Lu > > > --- > > > gcc/ipa-modref.cc | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff

[committed] MAINTAINERS: Update my email address

2024-09-03 Thread Szabolcs Nagy
* MAINTAINERS: Update my email address and add myself to DCO. --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 07ea5f5b6e1..cfd96c9f33e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -676,7 +676,7 @@ Christoph Müllner

[Patch, rs6000, middle-end] v10: Add implementation for different targets for pair mem fusion

2024-09-03 Thread Ajit Agarwal
Hello Richard: This patch addresses all the review comments. It also fix the arm build failure. Common infrastructure using generic code for pair mem fusion of different targets. rs6000 target specific code implement virtual functions defined by generic code. Target specific code are added in r

[PATCH] RISC-V: Also lower SLP grouped loads with just one consumer

2024-09-03 Thread Richard Biener
This makes sure to produce interleaving schemes or load-lanes for single-element interleaving and other permutes that otherwise would use more than three vectors. It exposes the latent issue that single-element interleaving with large gaps can be inefficient - the mitigation in get_group_load_stor

[PATCH] testsuite: Sanitize pacbti test cases for Cortex-M

2024-09-03 Thread Torbjörn SVENSSON
Ok for trunk and releases/gcc-14? -- Some of the test cases were scanning for "bti", but it would, incorrectly, match the ".arch_extenssion pacbti". Also, keep test cases active if a supported Cortex-M core is supplied. gcc/testsuite/ChangeLog: * gcc.target/arm/bti-1.c: Enable for Cor

Re: [PATCH 1/3] SVE intrinsics: Fold constant operands.

2024-09-03 Thread Jennifer Schmitz
> On 3 Sep 2024, at 10:39, Richard Biener wrote: > > External email: Use caution opening links or attachments > > > On Tue, 3 Sep 2024, Andrew Pinski wrote: > >> On Fri, Aug 30, 2024 at 4:41 AM Jennifer Schmitz wrote: >>> >>> This patch implements constant folding of binary operations for S

[PATCH v1 3/9] aarch64: Add minimal C++ support

2024-09-03 Thread Evgeny Karpov
Monday, September 2, 2024 3:15 PM Kyrylo Tkachov wrote: >> libstdc++-v3/ChangeLog: >> >>        * src/c++17/fast_float/fast_float.h (defined): Adjust a condition >>        for AArch64. > > libstdc++ is reviewed on its own list (CC’ed here) so I’d suggest splitting > the libstdc++-v3 hunk into its

[r15-3392 Regression] FAIL: gcc.target/i386/avx10_2-partial-bf-vector-smaxmin-1.c (test for excess errors) on Linux/x86_64

2024-09-03 Thread haochen.jiang
On Linux/x86_64, 62df24e50039ae04aa3b940e680cffd9041ef5bf is the first bad commit commit 62df24e50039ae04aa3b940e680cffd9041ef5bf Author: Levy Hsu Date: Tue Aug 27 14:22:20 2024 +0930 i386: Support partial vectorized V2BF/V4BF smaxmin caused FAIL: gcc.target/i386/avx10_2-512-bf-vector-sm

Re: PING: [PATCH] ipa: Don't disable function parameter analysis for fat LTO streaming

2024-09-03 Thread Richard Biener
On Mon, Sep 2, 2024 at 4:23 AM H.J. Lu wrote: > > On Tue, Aug 27, 2024 at 1:11 PM H.J. Lu wrote: > > > > Update analyze_parms not to disable function parameter analysis for > > -ffat-lto-objects. Tested on x86-64, there are no differences in zstd > > with "-O2 -flto=auto" -g "vs -O2 -flto=auto -

Re: [PATCH] Do not assert NUM_POLY_INT_COEFFS != 1 early

2024-09-03 Thread Jakub Jelinek
On Tue, Sep 03, 2024 at 10:42:34AM +0200, Richard Biener wrote: > The following moves the assert on NUM_POLY_INT_COEFFS != 1 after > INTEGER_CST processing. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu, pushed > as obvious after getting into stage3. > > * fold-const.cc (pol

Re: PING: [PATCH] ipa: Don't disable function parameter analysis for fat LTO streaming

2024-09-03 Thread Sam James
"H.J. Lu" writes: > On Tue, Aug 27, 2024 at 1:11 PM H.J. Lu wrote: >> >> Update analyze_parms not to disable function parameter analysis for >> -ffat-lto-objects. Tested on x86-64, there are no differences in zstd >> with "-O2 -flto=auto" -g "vs -O2 -flto=auto -g -ffat-lto-objects". >> >>

[PATCH] Do not assert NUM_POLY_INT_COEFFS != 1 early

2024-09-03 Thread Richard Biener
The following moves the assert on NUM_POLY_INT_COEFFS != 1 after INTEGER_CST processing. Bootstrap and regtest running on x86_64-unknown-linux-gnu, pushed as obvious after getting into stage3. * fold-const.cc (poly_int_binop): Move assert on NUM_POLY_INT_COEFFS after INTEGER_CST p

Re: [PATCH 1/3] SVE intrinsics: Fold constant operands.

2024-09-03 Thread Richard Biener
On Tue, 3 Sep 2024, Andrew Pinski wrote: > On Fri, Aug 30, 2024 at 4:41 AM Jennifer Schmitz wrote: > > > > This patch implements constant folding of binary operations for SVE > > intrinsics > > by calling the constant-folding mechanism of the middle-end for a given > > tree_code. > > In fold-con

RE: [gimplify.cc] Avoid ICE when passing VLA vector to accelerator

2024-09-03 Thread Richard Biener
On Tue, 3 Sep 2024, Prathamesh Kulkarni wrote: > > -Original Message- > > From: Richard Biener > > Sent: Monday, September 2, 2024 12:47 PM > > To: Prathamesh Kulkarni > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [gimplify.cc] Avoid ICE when passing VLA vector to > > accelerator > >

Re: [PATCH 1/3] SVE intrinsics: Fold constant operands.

2024-09-03 Thread Andrew Pinski
On Fri, Aug 30, 2024 at 4:41 AM Jennifer Schmitz wrote: > > This patch implements constant folding of binary operations for SVE intrinsics > by calling the constant-folding mechanism of the middle-end for a given > tree_code. > In fold-const.cc, the code for folding vector constants was moved from

[COMMITTED 09/10] ada: Plug loophole exposed by previous change

2024-09-03 Thread Marc Poulhiès
From: Eric Botcazou The change causes more temporaries to be created at call sites for unaligned actual parameters, thus revealing that the machinery does not properly deal with unconstrained nominal subtypes for them. gcc/ada/ * gcc-interface/trans.cc (create_temporary): Deal with type

[COMMITTED 06/10] ada: Fix internal error on pragma pack with discriminated record component

2024-09-03 Thread Marc Poulhiès
From: Eric Botcazou When updating the size after making a packable type in gnat_to_gnu_field, we fail to clear it again when it is not constant. gcc/ada/ * gcc-interface/decl.cc (gnat_to_gnu_field): Clear again gnu_size after updating it if it is not constant. Tested on x86_64-

[COMMITTED 07/10] ada: Pass unaligned record components by copy in calls on all platforms

2024-09-03 Thread Marc Poulhiès
From: Eric Botcazou This has historically been done only on platforms requiring the strict alignment of memory references, but this can arguably be considered as being mandated by the language on all of them. gcc/ada/ * gcc-interface/trans.cc (addressable_p) : Take into account

[COMMITTED 05/10] ada: Simplify Note_Uplevel_Bound procedure

2024-09-03 Thread Marc Poulhiès
The procedure Note_Uplevel_Bound was implemented as a custom expression tree walk. This change replaces this custom tree traversal by a more idiomatic use of Traverse_Proc. gcc/ada/ * exp_unst.adb (Check_Static_Type::Note_Uplevel_Bound): Refactor to use the generic Traverse_Proc.

[COMMITTED 10/10] ada: Add kludge for quirk of ancient 32-bit ABIs to previous change

2024-09-03 Thread Marc Poulhiès
From: Eric Botcazou Some ancient 32-bit ABIs, most notably that of x86/Linux, misalign double scalars in record types, so comparing DECL_ALIGN with TYPE_ALIGN directly may give the wrong answer for them. gcc/ada/ * gcc-interface/trans.cc (addressable_p) : Add kludge to cope with

[COMMITTED 08/10] ada: Fix internal error with Atomic Volatile_Full_Access object

2024-09-03 Thread Marc Poulhiès
From: Eric Botcazou The initial implementation of the GNAT aspect/pragma Volatile_Full_Access made it incompatible with Atomic, because it was not decided whether the read-modify-write sequences generated by Volatile_Full_Access would need to be implemented atomically when Atomic was also specifi

[COMMITTED 04/10] ada: Transform Length attribute references for non-Strict overflow mode.

2024-09-03 Thread Marc Poulhiès
From: Steve Baird The non-strict overflow checking code does a better job of eliminating overflow checks if given an expression consisting only of predefined operators (including relationals), literals, identifiers, and conditional expressions. If it is both feasible and useful, rewrite a Length

[COMMITTED 03/10] ada: Do not warn for partial access to Atomic Volatile_Full_Access objects

2024-09-03 Thread Marc Poulhiès
From: Eric Botcazou The initial implementation of the GNAT aspect/pragma Volatile_Full_Access made it incompatible with Atomic, because it was not decided whether the read-modify-write sequences generated by Volatile_Full_Access would need to be implemented atomically when Atomic was also specifi

[COMMITTED 01/10] ada: Fix Finalize_Storage_Only bug in b-i-p calls

2024-09-03 Thread Marc Poulhiès
From: Bob Duff Do not pass null for the Collection parameter when Finalize_Storage_Only is in effect. If the collection is null in that case, we will blow up later when we deallocate the object. gcc/ada/ * exp_ch6.adb (Add_Collection_Actual_To_Build_In_Place_Call): Remove Finali

  1   2   >