Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-10-31 Thread Aldy Hernandez via Gcc-patches
Ping ping On Mon, Oct 24, 2022, 08:04 Aldy Hernandez wrote: > PING > > On Mon, Oct 17, 2022 at 8:21 AM Aldy Hernandez wrote: > > > > On Thu, Oct 13, 2022 at 7:57 PM Jakub Jelinek wrote: > > > > > > On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote: > > > > +// Like real_arithmetic

Re: Adding a new thread model to GCC

2022-10-31 Thread Eric Botcazou via Gcc-patches
> could you please refresh/recheck your patch for the current gcc master > and solve the objections noted in the thread? is it possible? I have attached a revised version of the original patch at: https://gcc.gnu.org/legacy-ml/gcc-patches/2019-06/msg01840.html This reimplements the GNU threads

Re: [PATCH V2] [x86] Fix incorrect digit constraint

2022-10-31 Thread Uros Bizjak via Gcc-patches
On Mon, Oct 31, 2022 at 2:10 AM liuhongt wrote: > > >You have a couple of other patterns where operand 1 is matched to > >produce vmovddup insn. These are *avx512f_unpcklpd512 and > >avx_unpcklpd256. You can also remove expander in both > >cases. > > Yes, changed in V2 patch. > > Bootstrapped and

Re:[pushed] [PATCH v4] Libvtv: Add loongarch support.

2022-10-31 Thread Lulu Cheng
Pushed to r13-3571. 在 2022/10/29 下午2:53, Lulu Cheng 写道: v1 - > v2: 1. When the macro __loongarch_lp64 is defined, the VTV_PAGE_SIZE is set to 64K. 2. In the vtv_malloc.cc file __vtv_malloc_init function, it does not check whether VTV_PAGE_SIZE is equal to the system page size, if the macro

Re: [PATCH] Enable more optimization for 32-bit/64-bit shrd/shld with imm shift count.

2022-10-31 Thread Uros Bizjak via Gcc-patches
On Mon, Oct 31, 2022 at 2:25 AM liuhongt wrote: > > This patch doens't handle variable count since it require 5 insns to > be combined to get wanted pattern, but current pass_combine only > supports at most 4. > This patch doesn't handle 16-bit shrd/shld either. > > Ideally, we can avoid redundanc

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-10-31 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Wed, 26 Oct 2022 at 21:07, Richard Sandiford > wrote: >> >> Sorry for the slow response. I wanted to find some time to think >> about this a bit more. >> >> Prathamesh Kulkarni writes: >> > On Fri, 30 Sept 2022 at 21:38, Richard Sandiford >> > wrote: >> >> >> >

Re: [PATCH] Fortran: ordering of hidden procedure arguments [PR107441]

2022-10-31 Thread Mikael Morin
Le 30/10/2022 à 22:25, Mikael Morin a écrit : Le 30/10/2022 à 20:23, Mikael Morin a écrit : Another probable issue is your change to create_function_arglist changes arglist/hidden_arglist without also changing typelist/hidden_typelist accordingly.  I think a change to gfc_get_function_type is

Re: [PATCH] libstdc++: Small extended float support tweaks

2022-10-31 Thread Jonathan Wakely via Gcc-patches
On Fri, 21 Oct 2022 at 08:29, Jakub Jelinek wrote: > > Hi! > > The following patch isn't for immediate commit, as it has several > dependencies, in particular: > https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603665.html > https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604080.html

Re: [PATCH] libstdc++-v3: support for extended floating point types

2022-10-31 Thread Jonathan Wakely via Gcc-patches
On Fri, 21 Oct 2022 at 16:58, Jakub Jelinek wrote: > > Hi! > > The following patch adds support for extended floating point > types. > C++23 removes the float/double/long double specializations from the spec > and instead adds explicit(bool) specifier on the converting constructor. > The patch us

RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-31 Thread Joshi, Tejas Sanjay via Gcc-patches
[Public] Hi, > It is not latency. It is reciprocal throughput. For example, the > multiplication instruction has > latency 3 and reciprocal throughput 1, and the corresponding execution unit > can accept a new > multiplication instruction each cycle. In the .md file we are modeling that > by s

Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-31 Thread Jan Hubička via Gcc-patches
Hello, thanks for checking the performance. The patch is OK. Honza On Mon, Oct 31, 2022 at 11:39 AM Joshi, Tejas Sanjay < tejassanjay.jo...@amd.com> wrote: > [Public] > > Hi, > > > It is not latency. It is reciprocal throughput. For example, the > multiplication instruction has > > latency 3 and

Update email address

2022-10-31 Thread Ramana Radhakrishnan via Gcc-patches
As $subject. Pushed to trunk. Regards, Ramana diff --git a/MAINTAINERS b/MAINTAINERS index e4e7349a6d9..55c5ef95806 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -60,7 +60,7 @@ arc port Joern Rennecke arc port Claudiu Zissulescu arm port

RE: [PATCH 1/4]middle-end Support not decomposing specific divisions during vectorization.

2022-10-31 Thread Tamar Christina via Gcc-patches
> > The type of the expression should be available via the mode and the > signedness, no? So maybe to avoid having both RTX and TREE on the target > hook pass it a wide_int instead for the divisor? > Done. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok

RE: [PATCH 3/4]AArch64 Add SVE2 implementation for pow2 bitmask division

2022-10-31 Thread Tamar Christina via Gcc-patches
Ping > -Original Message- > From: Tamar Christina > Sent: Friday, September 23, 2022 10:34 AM > To: gcc-patches@gcc.gnu.org > Cc: nd ; Richard Earnshaw ; > Marcus Shawcroft ; Kyrylo Tkachov > ; Richard Sandiford > > Subject: [PATCH 3/4]AArch64 Add SVE2 implementation for pow2 bitmask > d

RE: [PATCH 2/4]AArch64 Add implementation for pow2 bitmask division.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, Ping, and updated patch based on mid-end changes. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-simd.md (@aarch64_bitmask_udiv3): New. * config/aarch64/aarch64.cc (aarch64_vectorize

RE: [PATCH 4/4]AArch64 sve2: rewrite pack + NARROWB + NARROWB to NARROWB + NARROWT

2022-10-31 Thread Tamar Christina via Gcc-patches
Ping > -Original Message- > From: Gcc-patches bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Tamar > Christina via Gcc-patches > Sent: Friday, September 23, 2022 10:34 AM > To: gcc-patches@gcc.gnu.org > Cc: Richard Earnshaw ; nd ; > Richard Sandiford ; Marcus Shawcroft > > Sub

RE: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, This is a respin with all feedback addressed. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * match.pd: Add fneg/fadd rule. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/addsub_1.c: New test.

RE: [PATCH]middle-end simplify complex if expressions where comparisons are inverse of one another.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi, This is a cleaned up version addressing all feedback. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * match.pd: Add new rule. gcc/testsuite/ChangeLog: * gcc.target/aarch64/if-compare_1.c:

RE: [PATCH]middle-end Recognize more conditional comparisons idioms.

2022-10-31 Thread Tamar Christina via Gcc-patches
> > This moves the pattern detection to match.pd instead. > > where's the other copy and is it possible to remove it with this patch? > It looks like it's spread over various passes. Starting with forwardprop. > > > + (simplify > > + (bit_ior:c > > + (mult:c @0 (convert (convert2? (op@4 @2

RE: [PATCH 2/2]AArch64 Perform more late folding of reg moves and shifts which arrive after expand

2022-10-31 Thread Tamar Christina via Gcc-patches
> > The same thing ought to work for smov, so it would be good to do both. > That would also make the split between the original and new patterns more > obvious: left shift for the old pattern, right shift for the new pattern. > Done, though because umov can do multilevel extensions I couldn't c

RE: [PATCH 1/2]middle-end Fold BIT_FIELD_REF and Shifts into BIT_FIELD_REFs alone

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, Here's a respin addressing review comments. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * match.pd: Add bitfield and shift folding. gcc/testsuite/ChangeLog: * gcc.dg/bitshift_1.c: Ne

[PATCH 1/2]middle-end: Add new tbranch optab to add support for bit-test-and-branch operations

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, This adds a new test-and-branch optab that can be used to do a conditional test of a bit and branch. This is similar to the cbranch optab but instead can test any arbitrary bit inside the register. This patch recognizes boolean comparisons and single bit mask tests. Bootstrapped Regtes

[PATCH 2/2]AArch64 Support new tbranch optab.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, This implements the new tbranch optab for AArch64. Instead of emitting the instruction directly I've chosen to expand the pattern using a zero extract and generating the existing pattern for comparisons for two reasons: 1. Allows for CSE of the actual comparison. 2. It looks like the

[PATCH]AArch64 Extend umov and sbfx patterns.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, Our zero and sign extend and extract patterns are currently very limited and only work for the original register size of the instructions. i.e. limited by GPI patterns. However these instructions extract bits and extend. This means that any register size can be used as an input as long a

[PATCH 2/8]middle-end: Recognize scalar widening reductions

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, This adds a new optab and IFNs for REDUC_PLUS_WIDEN where the resulting scalar reduction has twice the precision of the input elements. At some point in a later patch I will also teach the vectorizer to recognize this builtin once I figure out how the various bits of reductions work. For

[PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, The current vector extract pattern can only extract from a vector when the position to extract is a multiple of the vector bitsize as a whole. That means extract something like a V2SI from a V4SI vector from position 32 isn't possible as 32 is not a multiple of 64. Ideally this optab sho

[PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, This patch series is to add recognition of pairwise operations (reductions) in match.pd such that we can benefit from them even at -O1 when the vectorizer isn't enabled. Ths use of these allow for a lot simpler codegen in AArch64 and allows us to avoid quite a lot of codegen warts. As an

[PATCH 4/8]AArch64 aarch64: Implement widening reduction patterns

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, This implements the new widening reduction optab in the backend. Instead of introducing a duplicate definition for the same thing I have renamed the intrinsics defintions to use the same optab. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar

[PATCH 6/8]AArch64: Add peephole and scheduling logic for pairwise operations that appear late in RTL.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, Says what it does on the tin. In case some operations form in RTL due to a split, combine or any RTL pass then still try to recognize them. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-sim

[PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, The backend has an existing V2HFmode that is used by pairwise operations. This mode was however never made fully functional. Amongst other things it was never declared as a vector type which made it unusable from the mid-end. It's also lacking an implementation for load/stores so reload

[PATCH 7/8]AArch64: Consolidate zero and sign extension patterns and add missing ones.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, The target has various zero and sign extension patterns. These however live in various locations around the MD file and almost all of them are split differently. Due to the various patterns we also ended up missing valid extensions. For instance smov is almost never generated. This cha

[PATCH 8/8]AArch64: Have reload not choose to do add on the scalar side if both values exist on the SIMD side.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, Currently we often times generate an r -> r add even if it means we need two reloads to perform it, i.e. in the case that the values are on the SIMD side. The pairwise operations expose these more now and so we get suboptimal codegen. Normally I would have liked to use ^ or $ here, but w

Re: [PATCH]AArch64 Extend umov and sbfx patterns.

2022-10-31 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > Our zero and sign extend and extract patterns are currently very limited and > only work for the original register size of the instructions. i.e. limited by > GPI patterns. However these instructions extract bits and extend. This means > that any register si

RE: [GCC][PATCH v2] arm: Add cde feature support for Cortex-M55 CPU.

2022-10-31 Thread Srinath Parvathaneni via Gcc-patches
Hi, > -Original Message- > From: Christophe Lyon > Sent: Monday, October 17, 2022 2:30 PM > To: Srinath Parvathaneni ; gcc- > patc...@gcc.gnu.org > Cc: Richard Earnshaw > Subject: Re: [GCC][PATCH] arm: Add cde feature support for Cortex-M55 > CPU. > > Hi Srinath, > > > On 10/10/22 10:

[committed] amdgcn: Silence unused parameter warning

2022-10-31 Thread Andrew Stubbs
A function parameter was left over from a previous draft of my multiple-vector-length patch. This patch silences the harmless warning. Andrewamdgcn: Silence unused parameter warning gcc/ChangeLog: * config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): Set base_type a

[committed] amdgcn: add fmin/fmax patterns

2022-10-31 Thread Andrew Stubbs
This patch adds patterns for the fmin and fmax operators, for scalars, vectors, and vector reductions. The compiler uses smin and smax for most floating-point optimizations, etc., but not where the user calls fmin/fmax explicitly. On amdgcn the hardware min/max instructions are already IEEE c

[committed] amdgcn: multi-size vector reductions

2022-10-31 Thread Andrew Stubbs
My recent patch to add additional vector lengths didn't address the vector reductions yet. This patch adds the missing support. Shorter vectors use fewer reduction steps, and the means to extract the final value has been adjusted. Lacking from this is any useful costs, so for loops the vect p

Re: [PATCH] RISC-V: Change constexpr back to CONSTEXPR

2022-10-31 Thread Kito Cheng via Gcc-patches
Committed, thanks! On Fri, Oct 28, 2022 at 6:47 AM Jeff Law via Gcc-patches wrote: > > > On 10/27/22 08:41, juzhe.zh...@rivai.ai wrote: > > From: Ju-Zhe Zhong > > > > According to > > https://github.com/gcc-mirror/gcc/commit/f95d3d5de72a1c43e8d529bad3ef59afc3214705. > > Since GCC 4.8.6 doesn't

[Ping x2] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-10-31 Thread Chung-Lin Tang
Ping x2. On 2022/10/17 10:29 PM, Chung-Lin Tang wrote: > Ping. > > On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: >> Hi Tom, >> I had a patch submitted earlier, where I reported that the current way of >> implementing >> barriers in libgomp on nvptx created a quite significant perfo

Re: [PATCH Rust front-end v3 01/46] Use DW_ATE_UTF for the Rust 'char' type

2022-10-31 Thread Tom Tromey via Gcc-patches
> "Mark" == Mark Wielaard writes: Mark> DW_LANG_Rust_old was used by old rustc compilers <= 2016 before DWARF5 Mark> assigned an official number. It might be recognized by some Mark> debuggers. FWIW I wouldn't worry about it any more. We could probably just remove the '_old' constant. Tom

[Patch] OpenMP/Fortran: 'target update' with strides + DT components

2022-10-31 Thread Tobias Burnus
I recently saw that gfortran does not support derived type components with 'target update', an OpenMP 5.0 feature. When adding it, I also found out that strides where not handled. There is probably some room of improvement about what to copy and what not, but copying too much should be fine. Bui

Re: Adding a new thread model to GCC

2022-10-31 Thread i.nixman--- via Gcc-patches
On 2022-10-31 09:18, Eric Botcazou wrote: Hi Eric! thank you very much for the job! I will try to build our (MinGW-Builds project) builds using this patch and will report back. @Jonathan what the next steps to be taken to accept this patch? best! I have attached a revised version of th

[ada, patch] fix libgnat build on x86_64-linux-gnux32 with glibc <= 2.31

2022-10-31 Thread Matthias Klose
This was introduced with the fix and backports of PR103530 on x86_64-linux-gnux32 with older glibc versions (checked with 2.31), where dladdr is still in the libdl.so library, and not included in libc.so as in newer glibc versions. Linking of libgnat.so fails with [...] /usr/x86_64-linux-gnux3

Re: [committed] libstdc++: Fix compare_exchange_padding.cc test for std::atomic_ref

2022-10-31 Thread Eric Botcazou via Gcc-patches
> The test was only failing for me with -m32 (and not -m64), so I didn't > notice until now. That probably means we should make the test fail more > reliably if the padding isn't being cleared. The tests fail randomly for me on SPARC64/Linux: FAIL: 29_atomics/atomic/compare_exchange_padding.cc ex

[GCC][PATCH v2] arm: Add pacbti related multilib support for armv8.1-m.main.

2022-10-31 Thread Srinath Parvathaneni via Gcc-patches
Hi, This patch adds the support for pacbti multlilib linking by making "-mbranch-protection=none" as default in the command line for all M-profile targets and uses "-mbranch-protection=none" for multilib matching. If any valid value is passed to "-mbranch-protection" in the command line, this new

Re: [committed] libstdc++: Fix compare_exchange_padding.cc test for std::atomic_ref

2022-10-31 Thread Jonathan Wakely via Gcc-patches
On Mon, 31 Oct 2022 at 15:34, Eric Botcazou wrote: > > > The test was only failing for me with -m32 (and not -m64), so I didn't > > notice until now. That probably means we should make the test fail more > > reliably if the padding isn't being cleared. > > The tests fail randomly for me on SPARC64

optabs: Variable index vec_set

2022-10-31 Thread Robin Dapp via Gcc-patches
Hi, I'm looking into vec_set with variable index on s390. Uros posted a patch [1] that did not make it upstream in Nov 2020. It changed the mode of the index operand to whatever the target supports in can_vec_set_var_idx_p. I missed it back then but we indeed do not make proper use of vec_set w

[PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0

2022-10-31 Thread Daniel Engel
Hi Richard, I am re-submitting my libgcc patch from 2021: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html I believe I have finally made the stage1 window. Regards, Daniel --- Changes since v6:

[PATCH v7 01/34] Add and restructure function declaration macros

2022-10-31 Thread Daniel Engel
Most of these changes support subsequent patches in this series. Particularly, the FUNC_START macro becomes part of a new macro chain: * FUNC_ENTRY Common global symbol directives * FUNC_START_SECTION FUNC_ENTRY to start a new * FUNC_START FUNC_START_SECTION <

[PATCH v7 04/34] Reorganize LIB1ASMFUNCS object wrapper macros

2022-10-31 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/t-elf (LIB1ASMFUNCS): Split macros into logical groups. --- libgcc/config/arm/t-elf | 66 + 1 file changed, 53 insertions

[PATCH v7 07/34] Refactor 'ctz' functions into a new file

2022-10-31 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/lib1funcs.S (__ctzsi2): Moved to ... * config/arm/ctz2.S: New file. --- libgcc/config/arm/ctz2.S | 86 +++ libgcc/co

[PATCH v7 06/34] Refactor 'clz' functions into a new file

2022-10-31 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/lib1funcs.S (__clzsi2i, __clzdi2): Moved to ... * config/arm/clz2.S: New file. --- libgcc/config/arm/clz2.S | 145 ++

[PATCH v7 02/34] Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY

2022-10-31 Thread Daniel Engel
Since THUMB_FUNC_START does not insert the ".text" directive, it aligns more closely with the new FUNC_ENTRY maro and is renamed accordingly. THUMB_FUNC_START usage has been universally synonymous with the ".force_thumb" directive, so this is now folded into the definition. Usage of ".force_thumb"

[PATCH v7 05/34] Add the __HAVE_FEATURE_IT and IT() macros

2022-10-31 Thread Daniel Engel
These macros complement and extend the existing do_it() macro. Together, they streamline the process of optimizing short branchless contitional sequences to support ARM, Thumb-2, and Thumb-1. The inherent architecture limitations of Thumb-1 means that writing assembly code is somewhat more tedious

[PATCH v7 15/34] Import 'popcnt' functions from the CM0 library

2022-10-31 Thread Daniel Engel
The functional overlap between the single- and double-word functions makes this implementation about 30% smaller than the C functions if both functions are linked together in the same appliation. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/popcnt.S (__popcountsi, __popcoun

[PATCH v7 08/34] Refactor 64-bit shift functions into a new file

2022-10-31 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/lib1funcs.S (__ashldi3, __ashrdi3, __lshldi3): Moved to ... * config/arm/eabi/lshift.S: New file. --- libgcc/config/arm/eabi/lshift.S | 123 +

[PATCH v7 03/34] Fix syntax warnings on conditional instructions

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/lib1funcs.S (RETLDM, ARM_DIV_BODY, ARM_MOD_BODY, _interwork_call_via_lr): Moved condition code after the flags update specifier "s". (ARM_FUNC_START, THUMB_LDIV0): Removed redundant ".syntax". --- libgcc/c

[PATCH v7 09/34] Import 'clz' functions from the CM0 library

2022-10-31 Thread Daniel Engel
On architectures without __ARM_FEATURE_CLZ, this version combines __clzdi2() with __clzsi2() into a single object with an efficient tail call. Also, this version merges the formerly separate Thumb and ARM code implementations into a unified instruction sequence. This change significantly improves

[PATCH v7 17/34] Import 64-bit comparison from CM0 library

2022-10-31 Thread Daniel Engel
These are 2-5 instructions smaller and just as fast. Branches are minimized, which will allow easier adaptation to Thumb-2/ARM mode. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/lcmp.S (__aeabi_lcmp, __aeabi_ulcmp): Replaced; add macro configuration to build _

[PATCH v7 10/34] Import 'ctz' functions from the CM0 library

2022-10-31 Thread Daniel Engel
This version combines __ctzdi2() with __ctzsi2() into a single object with an efficient tail call. The former implementation of __ctzdi2() was in C. On architectures without __ARM_FEATURE_CLZ, this version merges the formerly separate Thumb and ARM code sequences into a unified instruction sequen

[PATCH v7 11/34] Import 64-bit shift functions from the CM0 library

2022-10-31 Thread Daniel Engel
The Thumb versions of these functions are each 1-2 instructions smaller and faster, and branchless when the IT instruction is available. The ARM versions were converted to the "xxl/xxh" big-endian register naming convention, but are otherwise unchanged. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Eng

[PATCH v7 18/34] Merge Thumb-2 optimizations for 64-bit comparison

2022-10-31 Thread Daniel Engel
This effectively merges support for all architecture variants into a common function path with appropriate build conditions. ARM performance is 1-2 instructions faster; Thumb-2 is about 50% faster. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi.S (__aeabi_lcmp, __aeabi_

[PATCH v7 13/34] Import 'ffs' functions from the CM0 library

2022-10-31 Thread Daniel Engel
This implementation provides an efficient tail call to __clzdi2(), making the functions rather smaller and faster than the C versions. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bits/ctz2.S (__ffssi2, __ffsdi2): New functions. * config/arm/t-elf (LIB1ASMFUNCS): Ad

[PATCH v7 12/34] Import 'clrsb' functions from the CM0 library

2022-10-31 Thread Daniel Engel
This implementation provides an efficient tail call to __clzsi2(), making the functions rather smaller and faster than the C versions. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bits/clz2.S (__clrsbsi2, __clrsbdi2): Added new functions. * config/arm/t-elf

[PATCH v7 16/34] Refactor Thumb-1 64-bit comparison into a new file

2022-10-31 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi-v6m.S (__aeabi_lcmp, __aeabi_ulcmp): Moved to ... * config/arm/eabi/lcmp.S: New file. * config/arm/lib1funcs.S: #include eabi/lcmp.S. --- l

[PATCH v7 21/34] Import 64-bit division from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi.c: Deleted unused file. * config/arm/eabi/ldiv.S (__aeabi_ldivmod, __aeabi_uldivmod): Replaced wrapper functions with a complete implementation. * config/arm/t-bpabi (LIB2ADD_ST): Removed bpabi.c.

[PATCH v7 14/34] Import 'parity' functions from the CM0 library

2022-10-31 Thread Daniel Engel
The functional overlap between the single- and double-word functions makes functions makes this implementation about half the size of the C functions if both functions are linked in the same application. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/parity.S: New file for __

[PATCH v7 19/34] Import 32-bit division from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/idiv.S: New file for __udivsi3() and __divsi3(). * config/arm/lib1funcs.S: #include eabi/idiv.S (v6m only). --- libgcc/config/arm/eabi/idiv.S | 299 ++ libgcc/config/arm/lib1funcs.S |

[PATCH v7 22/34] Import integer multiplication from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/lmul.S: New file for __muldi3(), __mulsidi3(), and __umulsidi3(). * config/arm/lib1funcs.S: #eabi/lmul.S (v6m only). * config/arm/t-elf: Add the new objects to LIB1ASMFUNCS. --- libgcc/config/arm/eab

[PATCH v7 25/34] Refactor Thumb-1 float subtraction into a new file

2022-10-31 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi-v6m.S (__aeabi_frsub): Moved to ... * config/arm/eabi/fadd.S: New file. * config/arm/lib1funcs.S: #include eabi/fadd.S (v6m only). --- libg

[PATCH v7 24/34] Import float comparison from the CM0 library

2022-10-31 Thread Daniel Engel
These functions are significantly smaller and faster than the wrapper functions and soft-float implementation they replace. Using the first comparison operator (e.g. '<=') in any program costs about 70 bytes initially, but every additional operator incrementally adds just 4 bytes. NOTE: It seems

[PATCH v7 20/34] Refactor Thumb-1 64-bit division into a new file

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi-v6m.S (__aeabi_ldivmod/ldivmod): Moved to ... * config/arm/eabi/ldiv.S: New file. * config/arm/lib1funcs.S: #include eabi/ldiv.S (v6m only). --- libgcc/config/arm/bpabi-v6m.S | 81 -

[PATCH v7 23/34] Refactor Thumb-1 float comparison into a new file

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi-v6m.S (__aeabi_cfcmpeq, __aeabi_cfcmple, __aeabi_cfrcmple, __aeabi_fcmpeq, __aeabi_fcmple, aeabi_fcmple, __aeabi_fcmpgt, aeabi_fcmpge): Moved to ... * config/arm/eabi/fcmp.S: New file. * confi

[PATCH v7 26/34] Import float addition and subtraction from the CM0 library

2022-10-31 Thread Daniel Engel
Since this is the first import of single-precision functions, some common parsing and formatting routines are also included. These common rotines will be referenced by other functions in subsequent commits. However, even if the size penalty is accounted entirely to __addsf3(), the total compiled s

[PATCH v7 33/34] Drop single-precision Thumb-1 soft-float functions

2022-10-31 Thread Daniel Engel
With the complete CM0 library integrated, regression testing showed new failures with the message "compilation failed to produce executable": gcc.dg/fixed-point/convert-float-1.c gcc.dg/fixed-point/convert-float-3.c gcc.dg/fixed-point/convert-sat.c Investigating, this appears to be ca

[PATCH v7 27/34] Import float multiplication from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/fmul.S (__mulsf3): New file. * config/arm/lib1funcs.S: #include eabi/fmul.S (v6m only). * config/arm/t-elf (LIB1ASMFUNCS): Moved _mulsf3 to global scope (this object was previously blocked on v6m build

[PATCH v7 28/34] Import float division from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/fdiv.S (__divsf3, __fp_divloopf): New file. * config/arm/lib1funcs.S: #include eabi/fdiv.S (v6m only). * config/arm/t-elf (LIB1ASMFUNCS): Added _divsf3 and _fp_divloopf. --- libgcc/config/arm/eabi/fdiv.S | 26

[PATCH v7 31/34] Import float<->double conversion from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/fcast.S (__aeabi_d2f, __aeabi_f2d): New file. * config/arm/lib1funcs.S: #include eabi/fcast.S (v6m only). * config/arm/t-elf (LIB1ASMFUNCS): Added _arm_d2f and _arm_f2d. --- libgcc/config/arm/eabi/fcast.S | 2

[PATCH v7 29/34] Import integer-to-float conversion from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi-lib.h (__floatdisf, __floatundisf): Remove obsolete RENAME_LIBRARY directives. * config/arm/eabi/ffloat.S (__aeabi_i2f, __aeabi_l2f, __aeabi_ui2f, __aeabi_ul2f): New file. * config/arm/lib1fun

[PATCH v7 30/34] Import float-to-integer conversion from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi-lib.h (muldi3): Removed duplicate. (fixunssfsi) Removed obsolete RENAME_LIBRARY directive. * config/arm/eabi/ffixed.S (__aeabi_f2iz, __aeabi_f2uiz, __aeabi_f2lz, __aeabi_f2ulz): New file. * co

[PATCH v7 32/34] Import float<->__fp16 conversion from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/fcast.S (__aeabi_h2f, __aeabi_f2h): Added functions. * config/arm/fp16 (__gnu_f2h_ieee, __gnu_h2f_ieee, __gnu_f2h_alternative, __gnu_h2f_alternative): Disable build for v6m multilibs. * config/arm/t-b

[PATCH v7 34/34] Add -mpure-code support to the CM0 functions.

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel Makefile.in (MPURE_CODE): New macro defines __PURE_CODE__. (gcc_compile): Appended MPURE_CODE. lib1funcs.S (FUNC_START_SECTION): Set flags for __PURE_CODE__. clz2.S (__clzsi2): Added -mpure-code compatible instructions.

Re: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/31/22 05:38, Tamar Christina via Gcc-patches wrote: Hi All, This is a respin with all feedback addressed. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * match.pd: Add fneg/fadd rule. gcc/testsuite/ChangeLog:

Re: [PATCH 1/2]middle-end Fold BIT_FIELD_REF and Shifts into BIT_FIELD_REFs alone

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/31/22 05:51, Tamar Christina via Gcc-patches wrote: Hi All, Here's a respin addressing review comments. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * match.pd: Add bitfield and shift folding

Re: Ping^3 [PATCH V2] Add attribute hot judgement for INLINE_HINT_known_hot hint.

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/30/22 19:44, Cui, Lili wrote: On 10/20/22 19:52, Cui, Lili via Gcc-patches wrote: Hi Honza, Gentle ping https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601934.html gcc/ChangeLog * ipa-inline-analysis.cc (do_estimate_edge_time): Add function attribute judgement for INL

Re: [PATCH] libstdc++-v3: support for extended floating point types

2022-10-31 Thread Jakub Jelinek via Gcc-patches
On Mon, Oct 31, 2022 at 10:26:11AM +, Jonathan Wakely wrote: > > --- libstdc++-v3/include/std/complex.jj 2022-10-21 08:55:43.037675332 +0200 > > +++ libstdc++-v3/include/std/complex2022-10-21 17:05:36.802243229 +0200 > > @@ -142,8 +142,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > > >/

Re: [committed] libstdc++: Fix compare_exchange_padding.cc test for std::atomic_ref

2022-10-31 Thread Eric Botcazou via Gcc-patches
> I suppose we could use memcmp on the as variable itself, to inspect > the actual stored padding rather than the returned copy of it. Yes, that's probably the only safe stance when optimization is enabled. -- Eric Botcazou

Re: [PATCH] libstdc++-v3: support for extended floating point types

2022-10-31 Thread Jonathan Wakely via Gcc-patches
On Mon, 31 Oct 2022 at 16:57, Jakub Jelinek wrote: > > On Mon, Oct 31, 2022 at 10:26:11AM +, Jonathan Wakely wrote: > > > --- libstdc++-v3/include/std/complex.jj 2022-10-21 08:55:43.037675332 > > > +0200 > > > +++ libstdc++-v3/include/std/complex2022-10-21 17:05:36.802243229 > > > +0200

Re: [committed] libstdc++: Fix compare_exchange_padding.cc test for std::atomic_ref

2022-10-31 Thread Jonathan Wakely via Gcc-patches
On Mon, 31 Oct 2022 at 17:03, Eric Botcazou wrote: > > > I suppose we could use memcmp on the as variable itself, to inspect > > the actual stored padding rather than the returned copy of it. > > Yes, that's probably the only safe stance when optimization is enabled. Strictly speaking, it's not

Re: [PATCH 1/4]middle-end Support not decomposing specific divisions during vectorization.

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/31/22 05:34, Tamar Christina wrote: The type of the expression should be available via the mode and the signedness, no? So maybe to avoid having both RTX and TREE on the target hook pass it a wide_int instead for the divisor? Done. Bootstrapped Regtested on aarch64-none-linux-gnu, x86

Re: [PATCH] Add __builtin_iseqsig()

2022-10-31 Thread Joseph Myers
On Fri, 28 Oct 2022, Jeff Law via Gcc-patches wrote: > Joseph, do you have bits in this space that are going to be landing soon, or > is your C2X work focused elsewhere?  Are there other C2X routines we need to > be proving builtins for? I don't have any builtins work planned for GCC 13 (maybe ad

Re: [PATCH v4] btf: Add support to BTF_KIND_ENUM64 type

2022-10-31 Thread Indu Bhagat via Gcc-patches
On 10/21/22 2:28 AM, Indu Bhagat via Gcc-patches wrote: On 10/19/22 19:05, Guillermo E. Martinez wrote: Hello, The following is patch v4 to update BTF/CTF backend supporting BTF_KIND_ENUM64 type. Changes from v3:    + Remove `ctf_enum_binfo' structure.    + Remove -m{little,big}-endian from dg

[PATCH] c, analyzer: support named constants in analyzer [PR106302]

2022-10-31 Thread David Malcolm via Gcc-patches
The analyzer's file-descriptor state machine tracks the access mode of opened files, so that it can emit -Wanalyzer-fd-access-mode-mismatch. To do this, its symbolic execution needs to "know" the values of the constants "O_RDONLY", "O_WRONLY", and "O_ACCMODE". Currently analyzer/sm-fd.cc simply u

Re: [PATCH] Add __builtin_iseqsig()

2022-10-31 Thread FX via Gcc-patches
Hi, Just adding, from the Fortran 2018 perspective, things we will need to implement for which I think support from the middle-end might be necessary: - rounded conversions: converting, from an integer or floating point type, into another floating point type, with specific rounding mode passed

Re: [PATCH, v2] Fortran: ordering of hidden procedure arguments [PR107441]

2022-10-31 Thread Harald Anlauf via Gcc-patches
Hi Mikael, thanks a lot, your testcases broke my initial (and incorrect) patch in multiple ways. I understand now that the right solution is much simpler and smaller. I've added your testcases, see attached, with a simple scan of the dump for the generated order of hidden arguments in the funct

[PATCH] libstdc++: Implement ranges::as_rvalue_view from P2446R2

2022-10-31 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk? libstdc++-v3/ChangeLog: * include/std/ranges (as_rvalue_view): Define. (enable_borrowed_range): Define. (views::__detail::__can_as_rvalue_view): Define. (views::_AsRvalue, views::as_rvalue): Define.

[PATCH] x86: Track converted/skipped registers in STV

2022-10-31 Thread H.J. Lu via Gcc-patches
When converting integer computations into vector ones, we build a chain from an integer definition instruction together with all dependent use instructions. The integer computations on the chain are converted to vector ones if the total vector costs are lower than the integer ones. Since the same

Re: [PATCH 1/2]middle-end: Add new tbranch optab to add support for bit-test-and-branch operations

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/31/22 05:53, Tamar Christina wrote: Hi All, This adds a new test-and-branch optab that can be used to do a conditional test of a bit and branch. This is similar to the cbranch optab but instead can test any arbitrary bit inside the register. This patch recognizes boolean comparisons a

Re: [PATCH]middle-end simplify complex if expressions where comparisons are inverse of one another.

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/31/22 05:42, Tamar Christina via Gcc-patches wrote: Hi, This is a cleaned up version addressing all feedback. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * match.pd: Add new rule. gcc/tests

Re: [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/31/22 05:56, Tamar Christina wrote: Hi All, This patch series is to add recognition of pairwise operations (reductions) in match.pd such that we can benefit from them even at -O1 when the vectorizer isn't enabled. Ths use of these allow for a lot simpler codegen in AArch64 and allows us

  1   2   >