Re: [PATCH v5 0/3] Hard Register Constraints

2025-07-21 Thread Xi Ruoyao
with a > simple loop. It's not if string.h is included. It's if string.h provides rawmemchr. rawmemchr is not a standard C function. It's a GNU extension and GCC is expected to work on various non-GNU systems. -- Xi Ruoyao

[PATCH] LoongArch: Fix wrong code generated by TARGET_VECTORIZE_VEC_PERM_CONST [PR121064]

2025-07-14 Thread Xi Ruoyao
When TARGET_VECTORIZE_VEC_PERM_CONST is called, target may be the same pseudo as op0 and/or op1. Loading the selector into target would clobber the input, producing wrong code like vld $vr0, $t0 vshuf.w $vr0, $vr0, $vr1 So don't load the selector into d->target, use a new pseudo to h

Re: [EXT] Re: [PATCH 2/2] lra: Reallow reloading user hard registers if the insn is not asm [PR 120983]

2025-07-12 Thread Xi Ruoyao
On Fri, 2025-07-11 at 14:01 -0500, Peter Bergner wrote: > On 7/11/25 10:22 AM, Vladimir Makarov wrote: > > On 7/8/25 9:43 PM, Xi Ruoyao wrote: > > > > > > IIUC "recog does not look at constraints until reload" has been a > > > well-established rule

[PATCH 2/2] lra: Reallow reloading user hard registers if the insn is not asm [PR 120983]

2025-07-08 Thread Xi Ruoyao
The PR 87600 fix has disallowed reloading user hard registers to resolve earlyclobber-induced conflict. However before reload, recog completely ignores the constraints of insns, so the RTL passes may produce insns where some user hard registers violate an earlyclobber. Then we'll get an ICE witho

[PATCH 1/2] testsuite: Enable the PR 87600 tests for LoongArch

2025-07-08 Thread Xi Ruoyao
I'm going to refine a part of the PR 87600 fix which seems triggering PR 120983 that LoongArch is particularly suffering. Enable the PR 87600 tests so I'll not regress PR 87600. gcc/testsuite/ChangeLog: PR rtl-optimization/87600 PR rtl-optimization/120983 * gcc.dg/pr87600

[PATCH 0/2] Fix PR120983

2025-07-08 Thread Xi Ruoyao
Bootstrapped and regtested on aarch64-linux-gnu, loongarch64-linux-gnu, and x86_64-linux-gnu. Ok for trunk? Xi Ruoyao (2): testsuite: Enable the PR 87600 tests for LoongArch lra: Reallow reloading user hard registers if the insn is not asm [PR 120983] gcc/lra-constraints.cc

Re: [PATCH] LoongArch: Fix ICE caused by _alsl_reversesi_extended.

2025-07-05 Thread Xi Ruoyao
On Sat, 2025-07-05 at 14:10 -0500, Segher Boessenkool wrote: > Hi! > > On Sat, Jul 05, 2025 at 11:10:05PM +0800, Xi Ruoyao wrote: > > Possibly this is https://gcc.gnu.org/PR101882.  Specifically comment 5 > > from Segher: > > > > "The LRA change is corre

Re: [PATCH] LoongArch: Fix ICE caused by _alsl_reversesi_extended.

2025-07-05 Thread Xi Ruoyao
On Sat, 2025-07-05 at 17:55 +0800, Xi Ruoyao wrote: > On Sat, 2025-07-05 at 11:20 +0800, Lulu Cheng wrote: > > For the gcc.target/loongarch/bitwise-shift-reassoc-clobber.c, > > some extensions are eliminated in ext_dce in commit r16-1835. > > > > This will result

Re: [PATCH] LoongArch: Fix ICE caused by _alsl_reversesi_extended.

2025-07-05 Thread Xi Ruoyao
"register_operand" "r"] >    "TARGET_64BIT >     && loongarch_reassoc_shift_bitwise (, operands[2], operands[3], > -    SImode)" > +    SImode) > +   && !(GP_REG_P (REGNO (operands[0])) > + && REGNO (operands[0]) == REGNO (operands[4]))" >    "#" >    "&& reload_completed" >    [; r0 = r1 [&|^] r3 is emitted in PREPARATION-STATEMENTS because we -- Xi Ruoyao

Re: [PATCH] LoongArch: Prevent subreg of subreg in CRC

2025-07-03 Thread Xi Ruoyao
On Fri, 2025-07-04 at 11:14 +0800, Xi Ruoyao wrote: > On Fri, 2025-07-04 at 09:47 +0800, Lulu Cheng wrote: > > > > 在 2025/7/2 下午3:31, Xi Ruoyao 写道: > > > The register_operand predicate can match subreg, then we'd have a > > > subreg > > > of subreg

Re: [PATCH] LoongArch: Prevent subreg of subreg in CRC

2025-07-03 Thread Xi Ruoyao
On Fri, 2025-07-04 at 09:47 +0800, Lulu Cheng wrote: > > 在 2025/7/2 下午3:31, Xi Ruoyao 写道: > > The register_operand predicate can match subreg, then we'd have a subreg > > of subreg and it's invalid.  Use lowpart_subreg to avoid the nested > >

[PATCH] LoongArch: testsuite: Adapt bstrpick_alsl_paired.c for GCC 16 change

2025-07-03 Thread Xi Ruoyao
In GCC 16 the compiler is smarter and it optimizes away the unneeded zero-extension during the expand pass. Thus we can no longer match and_alsl_reversed. Drop the scan-rtl-dump for and_alsl_reversed and add scan-assembler-not against bstrpick.d to detect the unneeded zero-extension in case it re

[PATCH] LoongArch: Prevent subreg of subreg in CRC

2025-07-02 Thread Xi Ruoyao
The register_operand predicate can match subreg, then we'd have a subreg of subreg and it's invalid. Use lowpart_subreg to avoid the nested subreg. gcc/ChangeLog: * config/loongarch/loongarch.md (crc_combine): Avoid nested subreg. gcc/testsuite/ChangeLog: * gcc.c-tortu

Re: [RFC PATCH v1 12/31] LoongArch: Forbid k, ZC constraints for movsi_internal

2025-06-10 Thread Xi Ruoyao
load,store,mgtf,fpload,mftg,fpstore") > +   (set_attr "mode" "SI")]) > + > +(define_insn_and_split "*movsi_internal_la32" > +  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,m,*f,f,*r,*m") > + (match_operand:SI 1 "move_operand" "r,Yd,m,rJ,*r*J,m,*f,*f"))] > +  "TARGET_32BIT > +   && (register_operand (operands[0], SImode) > +   || reg_or_0_operand (operands[1], SImode))" >    { return loongarch_output_move (operands); } >    "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO >    (operands[0]))" -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [RFC PATCH v1 15/31] LoongArch: Disable k constraint on LA32

2025-06-10 Thread Xi Ruoyao
(op, 0), mode)"))) > +   (match_test "TARGET_64BIT && loongarch_base_index_address_p (XEXP > (op, 0), mode)"))) IMO it's more natural to do (and (match_code "mem") (match_test "TARGET_64BIT") (match_test "loongarch_b

Re: [RFC PATCH v1 16/31] LoongArch: Add support for atomic on LA32

2025-06-10 Thread Xi Ruoyao
t; +} > +  [(set (attr "length") (const_int 20))]) >   >  (define_insn "atomic_exchange" >    [(set (match_operand:GPR 0 "register_operand" "=&r") > @@ -217,10 +244,21 @@ (define_insn "atomic_exchange" >      (match_operand:

Re: [RFC PATCH v1 25/31] LoongArch: macro instead enum for base abi type

2025-06-10 Thread Xi Ruoyao
n the lines below). > +#define ABI_BASE_ILP32F   1 > +#define ABI_BASE_ILP32S   2 > +#define ABI_BASE_LP64D   3 > +#define ABI_BASE_LP64F   4 > +#define ABI_BASE_LP64S   5 > +#define N_ABI_BASE_TYPES  6 >   >  extern loongarch_def_array >    loongarch_abi_base_strings; -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [RFC PATCH v1 08/31] LoongArch: Forbid ADDRESS_REG_REG in loongarch32

2025-06-10 Thread Xi Ruoyao
ns "-march=loongarch32 -mabi=ilp32d -O2" } */ > +long long foo(long long *arr, long long index) > +{ > + return arr[index]; > +} > \ No newline at end of file Please don't leave files with no newline at end. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [RFC PATCH v1 13/31] LoongArch: Forbid stptr/ldptr when enable -fshrink-wrap.

2025-06-10 Thread Xi Ruoyao
+ if (IMM12_OPERAND (offset) > +     || (TARGET_64BIT && (offset < 32768))) >     bitmap_set_bit (components, regno); >   >   offset -= UNITS_PER_WORD; -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [RFC PATCH v1 10/31] LoongArch: Disable extreme code model for crtbeginS.o on LA32

2025-06-10 Thread Xi Ruoyao
) IIRC this change is caused by a downstream autoconf patch used by some distro. Thus we need to regenerate the configure script with the vanilla autoconf-2.69 to avoid the unrelated change. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] ext-dce: Don't refine live width with SUBREG mode if !TRULY_NOOP_TRUNCATION_MODES_P [PR 120050]

2025-06-04 Thread Xi Ruoyao
On Wed, 2025-05-28 at 18:17 +0100, Richard Sandiford wrote: > Sorry for the slow reply, had a few days off. > > Xi Ruoyao writes: > > If we see a promoted subreg and TRULY_NOOP_TRUNCATION says the > > truncation is not a noop, then all bits of the inner reg are live.  We

Re: [RFC] RISC-V: Support -mcpu for XiangShan Kunminghu cpu.

2025-06-04 Thread Xi Ruoyao
zimop_zkn_zknd_zkne_zknh_zksed_zksh_zkt_zvbb_zvfh_" > +   "zvfhmin_zvkt_zvl128b_zvl32b_zvl64b", IIUC zvl128b implies zvl32b and zvl64b, then should we explicitly give zvl32b and zvl64b here? > +   "xiangshan-kunminghu") -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Fwd: [PATCH] testsuite: Fix up dg-do-if

2025-05-26 Thread Xi Ruoyao
I forgot to send this to the list :(. Forwarded Message From: Xi Ruoyao To: Alexandre Oliva Cc: Xi Ruoyao Subject: [PATCH] testsuite: Fix up dg-do-if Date: 05/26/25 17:59:32 The line number needs to be passed to dg-do, instead of being stripped. Fixes 'compile: syntax

[PATCH v2] ext-dce: Don't refine live width with SUBREG mode if !TRULY_NOOP_TRUNCATION_MODES_P [PR 120050]

2025-05-23 Thread Xi Ruoyao
If we see a promoted subreg and TRULY_NOOP_TRUNCATION says the truncation is not a noop, then all bits of the inner reg are live. We cannot reduce the live mask to that of the mode of the subreg. gcc/ChangeLog: PR rtl-optimization/120050 * ext-dce.cc (ext_dce_process_uses): Break

Re: [PATCH 1/3] LoongArch: testsuite: Fix pr112325.c and pr117888-1.c.

2025-05-23 Thread Xi Ruoyao
} } > */ >  /* { dg-additional-options "--param max-completely-peeled-insns=200" > { target powerpc64*-*-* } } */ > +/* { dg-additional-options "-mlsx" { target loongarch64-*-* } } */ >   >  typedef unsigned short ggml_fp16_t; >  static float table_f32_f16[1 << 16]; -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: RISC-V TLS Descriptors in GCC

2025-05-22 Thread Xi Ruoyao
upport for RISC-V TLSDESC? I don't think it's accurate. The RISC-V TLSDESC support is just not merged into Glibc yet. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH] doc: Document the 'q' constraint for LoongArch

2025-05-21 Thread Xi Ruoyao
The kernel developers have requested such a constraint to use csrxchg in inline assembly. gcc/ChangeLog: * doc/md.texi: Document the 'q' constraint for LoongArch. --- Ok for trunk? gcc/doc/md.texi | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi

Re: [PATCH 3/5] libstdc++: keep subtree sizes in pb_ds binary search trees (PR 81806)

2025-05-20 Thread Xi Ruoyao
On Tue, 2025-05-20 at 13:06 +0100, Jonathan Wakely wrote: > On 13/07/20 16:45 +0800, Xi Ruoyao via Libstdc++ wrote: > > > > > The second and third patch together resolve PR 81806. > > > > The attached patch modifies split_finish to use the subtree size we > >

Re: [RFC PATCH 0/3] _BitInt(N) support for LoongArch

2025-05-20 Thread Xi Ruoyao
quired by the ISA spec. I'm trying to fix an ext-dce bug regarding !TARGET_TRULY_NOOP_TRUNCATION so I just decided to chime in and explain this :). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] ext-dce: Only transform extend to subreg if TRULY_NOOP_TRUNCATION [PR 120050]

2025-05-12 Thread Xi Ruoyao
On Mon, 2025-05-12 at 12:59 +0100, Richard Sandiford wrote: > Xi Ruoyao writes: > > The tranform would be unsafe if !TRULY_NOOP_TRUNCATION because on these > > machines the hardware may look at bits outside of the given mode. > > > > gcc/ChangeLog: > > &

[PATCH] ext-dce: Only transform extend to subreg if TRULY_NOOP_TRUNCATION [PR 120050]

2025-05-12 Thread Xi Ruoyao
The tranform would be unsafe if !TRULY_NOOP_TRUNCATION because on these machines the hardware may look at bits outside of the given mode. gcc/ChangeLog: PR rtl-optimization/120050 * ext-dce.cc (ext_dce_try_optimize_insn): Only transform the insn if TRULY_NOOP_TRUNCATION. -

Re: [PATCH 30/61] MSA: Make MSA and microMIPS R5 unsupported

2025-04-27 Thread Xi Ruoyao
bination: %s", "-mmicromips -mmsa"); And should this line be updated too like "-mmicromips -mmsa is only supported for MIPSr6"? Unfortunately the original patch is already applied and breaking even a non-bootstrapping build for MIPS. Thus a fix is needed ASAP or we'd revert the original patch. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Pushed r15-9167: [PATCH] LoongArch: Make gen-evolution.awk compatible with FreeBSD awk

2025-04-04 Thread Xi Ruoyao
On Thu, 2025-04-03 at 10:13 +0800, Lulu Cheng wrote: > > 在 2025/4/2 上午11:19, Xi Ruoyao 写道: > > Avoid using gensub that FreeBSD awk lacks, use gsub and split those > > each > > of gawk, mawk, and FreeBSD awk provides. > > > > Reported-by: mp...@vip.163.com &

[PATCH] LoongArch: Make gen-evolution.awk compatible with FreeBSD awk

2025-04-01 Thread Xi Ruoyao
Avoid using gensub that FreeBSD awk lacks, use gsub and split those each of gawk, mawk, and FreeBSD awk provides. Reported-by: mp...@vip.163.com Link: https://man.freebsd.org/cgi/man.cgi?query=awk gcc/ChangeLog: * config/loongarch/genopts/gen-evolution.awk: Avoid using gensub tha

[gcc-14 PATCH] Reuse scratch registers generated by LRA

2025-03-27 Thread Xi Ruoyao
From: Denis Chertykov Test file: udivmoddi.c problem insn: 484 Before LRA pass we have: (insn 484 483 485 72 (parallel [ (set (reg/v:SI 143 [ __q1 ]) (plus:SI (reg/v:SI 143 [ __q1 ]) (const_int -2 [0xfffe]))) (clobber (scrat

[PATCH] LoongArch: Add ABI names for FPR

2025-03-15 Thread Xi Ruoyao
We already allow the ABI names for GPR in inline asm clobber list, so for consistency allow the ABI names for FPR as well. Reported-by: Yao Zi gcc/ChangeLog: * config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add fa0-fa7, ft0-ft16, and fs0-fs7. gcc/testsuite/ChangeLog:

[PATCH] LoongArch: Don't use C++17 feature [PR119238]

2025-03-12 Thread Xi Ruoyao
Structured binding is a C++17 feature but the GCC code base is in C++14. gcc/ChangeLog: PR target/119238 * config/loongarch/simd.md (dot_prod): Stop using structured binding. --- Ok for trunk? gcc/config/loongarch/simd.md | 14 -- 1 file changed, 8 insertion

[PATCH] LoongArch: Fix ICE when trying to recognize bitwise + alsl.w pair [PR119127]

2025-03-11 Thread Xi Ruoyao
When we call loongarch_reassoc_shift_bitwise for _alsl_reversesi_extend, the mask is in DImode but we are trying to operate it in SImode, causing an ICE. To fix the issue sign-extend the mask into the mode we want. And also specially handle the case the mask is extended into -1 to avoid a miss-op

Re: [PATCH] LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084]

2025-03-04 Thread Xi Ruoyao
On Wed, 2025-03-05 at 10:52 +0800, Lulu Cheng wrote: > LGTM! Pushed to trunk. The draft of gcc-14 backport is attached, I'll push it if it builds & tests fine and there's no objection. -- Xi Ruoyao School of Aerospace Science and Technology, Xidia

[PATCH 08/17] LoongArch: Implement subword atomic_fetch_{and, or, xor} with am*.w instructions

2025-03-03 Thread Xi Ruoyao
We can just shift the mask and fill the other bits with 0 (for ior/xor) or 1 (for and), and use an am*.w instruction to perform the atomic operation, instead of using a LL-SC loop. gcc/ChangeLog: * config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AND): Remove. (UNSPEC_COM

[PATCH 13/17] LoongArch: Add -m[no-]scq option

2025-03-03 Thread Xi Ruoyao
We'll use the sc.q instruction for some 16-byte atomic operations, but it's only added in LoongArch 1.1 evolution so we need to gate it with an option. gcc/ChangeLog: * config/loongarch/genopts/isa-evolution.in (scq): New evolution feature. * config/loongarch/loongarch-evo

[PATCH] LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084]

2025-03-02 Thread Xi Ruoyao
They could be incorrectly reordered with store instructions like st.b because the RTL expression does not have a memory_operand or a (mem) expression. The incorrect reorder has been observed in openh264 LTO build. Expand them to a (mem) expression instead of unspec to fix the issue. Then we need

[PATCH 01/17] LoongArch: (NFC) Remove atomic_optab and use amop instead

2025-03-02 Thread Xi Ruoyao
They are the same. gcc/ChangeLog: * config/loongarch/sync.md (atomic_optab): Remove. (atomic_): Change atomic_optab to amop. (atomic_fetch_): Likewise. --- gcc/config/loongarch/sync.md | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/gcc/config/lo

[PATCH 05/17] LoongArch: Don't emit overly-restrictive barrier for LL-SC loops

2025-03-01 Thread Xi Ruoyao
For LL-SC loops, if the atomic operation has succeeded, the SC instruction always imply a full barrier, so the barrier we manually inserted only needs to take the account for the failure memorder, not the success memorder (the barrier is skipped with "b 3f" on success anyway). Note that if we use

[PATCH 17/17] LoongArch: Implement 16-byte atomic add, sub, and, or, xor, and nand with sc.q

2025-03-01 Thread Xi Ruoyao
gcc/ChangeLog: * config/loongarch/sync.md (UNSPEC_TI_FETCH_ADD): New unspec. (UNSPEC_TI_FETCH_SUB): Likewise. (UNSPEC_TI_FETCH_AND): Likewise. (UNSPEC_TI_FETCH_XOR): Likewise. (UNSPEC_TI_FETCH_OR): Likewise. (UNSPEC_TI_FETCH_NAND_MASK_INVERTED): Like

[PATCH 11/17] LoongArch: Implement 16-byte atomic load with LSX

2025-03-01 Thread Xi Ruoyao
If the vector is naturally aligned, it cannot cross cache lines so the LSX load is guaranteed to be atomic. Thus we can use LSX to do the lock-free atomic load, instead of using a lock. gcc/ChangeLog: * config/loongarch/sync.md (atomic_loadti_lsx): New define_insn. (atomic_loadti

[PATCH 03/17] LoongArch: Don't use "+" for atomic_{load, store} "m" constraint

2025-02-28 Thread Xi Ruoyao
Atomic load does not modify the memory. Atomic store does not read the memory, thus we can use "=" instead. gcc/ChangeLog: * config/loongarch/sync.md (atomic_load): Remove "+" for the memory operand. (atomic_store): Use "=" instead of "+" for the memory operand. -

[PATCH 15/17] LoongArch: Implement 16-byte CAS with sc.q

2025-02-28 Thread Xi Ruoyao
gcc/ChangeLog: * config/loongarch/sync.md (atomic_compare_and_swapti_scq): New define_insn. (atomic_compare_and_swapti): New define_expand. --- gcc/config/loongarch/sync.md | 89 1 file changed, 89 insertions(+) diff --git a/gcc/config

[PATCH 10/17] LoongArch: Implement atomic_fetch_nand

2025-02-28 Thread Xi Ruoyao
Without atomic_fetch_nandsi and atomic_fetch_nanddi, __atomic_fetch_nand is expanded to a loop containing a CAS in the body, and CAS itself is a LL-SC loop so we have a nested loop. This is obviously not a good idea as we just need one LL-SC loop in fact. As ~(atom & mask) is (~mask) | (~atom), w

[PATCH 06/17] LoongArch: Remove unneeded "b 3f" instruction after LL-SC loops

2025-02-28 Thread Xi Ruoyao
This instruction is used to skip an redundant barrier if -mno-ld-seq-sa or the memory model requires a barrier on failure. But with -mld-seq-sa and other memory models the barrier may be nonexisting at all, and we should remove the "b 3f" instruction as well. The implementation uses a new operand

[PATCH 16/17] LoongArch: Implement 16-byte atomic exchange with sc.q

2025-02-28 Thread Xi Ruoyao
gcc/ChangeLog: * config/loongarch/sync.md (atomic_exchangeti_scq): New define_insn. (atomic_exchangeti): New define_expand. --- gcc/config/loongarch/sync.md | 35 +++ 1 file changed, 35 insertions(+) diff --git a/gcc/config/loongarch/sync.m

[PATCH 09/17] LoongArch: Don't expand atomic_fetch_sub_{hi, qi} to LL-SC loop if -mlam-bh

2025-02-28 Thread Xi Ruoyao
With -mlam-bh, we should negate the addend first, and use an amadd instruction. Disabling the expander makes the compiler do it correctly. gcc/ChangeLog: * config/loongarch/sync.md (atomic_fetch_sub): Disable if ISA_HAS_LAM_BH. --- gcc/config/loongarch/sync.md | 2 +- 1 file cha

[PATCH 14/17] LoongArch: Implement 16-byte atomic store with sc.q

2025-02-28 Thread Xi Ruoyao
When LSX is not available but sc.q is (for example on LA664 where the SIMD unit is not enabled), we can use a LL-SC loop for 16-byte atomic store. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_print_operand_reloc): Accept "%t" for printing the number of the 64-bit mach

[PATCH 12/17] LoongArch: Implement 16-byte atomic store with LSX

2025-02-28 Thread Xi Ruoyao
If the vector is naturally aligned, it cannot cross cache lines so the LSX store is guaranteed to be atomic. Thus we can use LSX to do the lock-free atomic store, instead of using a lock. gcc/ChangeLog: * config/loongarch/sync.md (atomic_storeti_lsx): New define_insn. (at

[PATCH 07/17] LoongArch: Remove unneeded "andi offset, addr, 3" instruction in atomic_test_and_set

2025-02-28 Thread Xi Ruoyao
On LoongArch sll.w and srl.w instructions only take the [4:0] bits of rk (shift amount) into account, and we've already defined SHIFT_COUNT_TRUNCATED to 1 so the compiler knows this fact, thus we don't need this instruction. gcc/ChangeLog: * config/loongarch/sync.md (atomic_test_and_set):

[PATCH 02/17] LoongArch: (NFC) Remove amo and use size instead

2025-02-28 Thread Xi Ruoyao
They are the same. gcc/ChangeLog: * config/loongarch/sync.md: Use instead of . (amo): Remove. --- gcc/config/loongarch/sync.md | 53 +--- 1 file changed, 25 insertions(+), 28 deletions(-) diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loo

[PATCH 04/17] LoongArch: Allow using bstrins for masking the address in atomic_test_and_set

2025-02-28 Thread Xi Ruoyao
We can use bstrins for masking the address here. As people are already working on LA32R (which lacks bstrins instructions), for future-proofing we check whether (const_int -4) is an and_operand and force it into an register if not. gcc/ChangeLog: * config/loongarch/sync.md (atomic_test_a

[PATCH 00/17] LoongArch: Clean up atomic operations and implement 16-byte atomic operations

2025-02-28 Thread Xi Ruoyao
The entire patch bootstrapped and regtested on loongarch64-linux-gnu with -march=la664, and I've also tried several simple 16-byte atomic operation tests locally. OK for trunk? Or maybe the clean up is OK but the 16-byte atomic implementation still needs to be confirmed by the hardware team

[PATCH] LoongArch: Add a dedicated pattern for bitwise + alsl

2025-02-28 Thread Xi Ruoyao
We've implemented the slli + bitwise => bitwise + slli reassociation in r15-7062. I'd hoped late combine could handle slli.d + bitwise + add.d => bitwise + slli.d + add.d => bitwise => alsl.d, but it does not always work, for example a |= 0xfff; b |= 0xfff; a <<= 2; b <<= 2; a += x; b

Re: [PATCH] LoongArch: Avoid unnecessary zero-initialization using LSX for scalar popcount

2025-02-25 Thread Xi Ruoyao
On Tue, 2025-02-25 at 20:49 +0800, Lulu Cheng wrote: > > 在 2025/2/22 下午3:34, Xi Ruoyao 写道: > > Now for __builtin_popcountl we are getting things like > > > > vrepli.b$vr0,0 > > vinsgr2vr.d $vr0,$r4,0 > > vpcnt.d $vr0,$vr0 > >

[PATCH] LoongArch: Avoid unnecessary zero-initialization using LSX for scalar popcount

2025-02-21 Thread Xi Ruoyao
Now for __builtin_popcountl we are getting things like vrepli.b$vr0,0 vinsgr2vr.d $vr0,$r4,0 vpcnt.d $vr0,$vr0 vpickve2gr.du $r4,$vr0,0 slli.w $r4,$r4,0 jr $r1 The "vrepli.b" instruction is introduced by the init-regs pass (see PR618

Re: [RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results.

2025-02-19 Thread Xi Ruoyao
assumptions about the rounding modes in > floating-point > calculations, such as in float_extend, which may prevent CSE optimizations. > Could > this also lead to lost optimization opportunities in other areas that don't > require > this option? I'm not sure. > > I suspect that the best approach would be to define relevant > attributes (perhaps similar to -frounding-math) within specific related > patterns/built-ins > to inform optimizers we are using a rounding mode and to avoid > over-optimization. The "special pattern" is supposed to be #pragma STDC FENV_ACCESS that we've not implemented. See https://gcc.gnu.org/PR34678. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Ping: [PATCH] testsuite: Fix up toplevel-asm-1.c for LoongArch

2025-02-18 Thread Xi Ruoyao
On Wed, 2025-02-05 at 08:57 +0800, Xi Ruoyao wrote: > Like RISC-V, on LoongArch we don't really support %cN for SYMBOL_REFs > even with -fno-pic. > > gcc/testsuite/ChangeLog: > > * c-c++-common/toplevel-asm-1.c: Use %cc3 %cc4 instead of %c3 > %c4 on LoongArc

[PATCH] LoongArch: Use normal RTL pattern instead of UNSPEC for {x, }vsr{a, l}ri instructions

2025-02-14 Thread Xi Ruoyao
Allowing (t + (1ul << imm >> 1)) >> imm to be recognized as a rounding shift operation. gcc/ChangeLog: * config/loongarch/lasx.md (UNSPEC_LASX_XVSRARI): Remove. (UNSPEC_LASX_XVSRLRI): Remove. (lasx_xvsrari_): Remove. (lasx_xvsrlri_): Remove. * config/loonga

Re: [PATCH v2 2/8] LoongArch: Allow moving TImode vectors

2025-02-14 Thread Xi Ruoyao
On Fri, 2025-02-14 at 15:46 +0800, Lulu Cheng wrote: > Hi, > > If only apply the first and second patches, the code will not compile. > > Otherwise LGTM. Fixed in v3: https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675776.html -- Xi Ruoyao School of Aerospace Science

[PATCH v3 5/8] LoongArch: Simplify {lsx_,lasx_x}vmaddw description

2025-02-14 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. Also reorder two operands of the outer plus in the template, so combine will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}. gcc/ChangeL

[PATCH v3 4/8] LoongArch: Simplify {lsx_, lasx_x}vh{add, sub}w description

2025-02-14 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. gcc/ChangeLog: * config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove. (UNSPEC_LASX_XVHSUBW_Q_D): Remove. (UNSPEC_LASX

[PATCH v3 6/8] LoongArch: Simplify lsx_vpick description

2025-02-14 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates instead of hard-coded const vectors. This is not suitable for LASX where lasx_xvpick has a different semantic. gcc/ChangeLog: * config/loongarch/simd.md (LVEC): New define_mode_attr. (simdfmt_as_

[PATCH v3 8/8] LoongArch: Implement [su]dot_prod* for LSX and LASX modes

2025-02-14 Thread Xi Ruoyao
Despite it's just a special case of "a widening product of which the result used for reduction," having these standard names allows to recognize the dot product pattern earlier and it may be beneficial to optimization. Also fix some test failures with the test cases: - gcc.dg/vect/vect-reduc-chai

[PATCH v3 7/8] LoongArch: Implement vec_widen_mult_{even, odd}_* for LSX and LASX modes

2025-02-14 Thread Xi Ruoyao
Since PR116142 has been fixed, now we can add the standard names so the compiler will generate better code if the result of a widening production is reduced. gcc/ChangeLog: * config/loongarch/simd.md (even_odd): New define_int_attr. (vec_widen_mult__): New define_expand. gcc/test

[PATCH v3 1/8] LoongArch: Try harder using vrepli instructions to materialize const vectors

2025-02-14 Thread Xi Ruoyao
For a = (v4si){0x, 0x, 0x, 0x} we just want vrepli.b $vr0, 0xdd but the compiler actually produces a load: la.local $r14,.LC0 vld $vr0,$r14,0 It's because we only tried vrepli.d which wouldn't work. Try all vrepli instructions for const int vector

[PATCH v3 2/8] LoongArch: Allow moving TImode vectors

2025-02-14 Thread Xi Ruoyao
We have some vector instructions for operations on 128-bit integer, i.e. TImode, vectors. Previously they had been modeled with unspecs, but it's more natural to just model them with TImode vector RTL expressions. For the preparation, allow moving V1TImode and V2TImode vectors in LSX and LASX reg

[PATCH v3 3/8] LoongArch: Simplify {lsx_, lasx_x}v{add, sub, mul}l{ev, od} description

2025-02-14 Thread Xi Ruoyao
These pattern definitions are tediously long, invoking 32 UNSPECs and many hard-coded long const vectors. To simplify them, at first we use the TImode vector operations instead of the UNSPECs, then we adopt an approach in AArch64: using a special predicate to match the const vectors for odd/even i

[PATCH v3 0/8] LoongArch: SIMD odd/even/horizontal widening arithmetic cleanup and optimization

2025-02-14 Thread Xi Ruoyao
tested on loongarch64-linux-gnu, no new code change in v3. Ok for trunk? Xi Ruoyao (8): LoongArch: Try harder using vrepli instructions to materialize const vectors LoongArch: Allow moving TImode vectors LoongArch: Simplify {lsx_,lasx_x}v{add,sub,mul}l{ev,od} description LoongArch: Si

[PATCH v2 7/8] LoongArch: Implement vec_widen_mult_{even, odd}_* for LSX and LASX modes

2025-02-13 Thread Xi Ruoyao
Since PR116142 has been fixed, now we can add the standard names so the compiler will generate better code if the result of a widening production is reduced. gcc/ChangeLog: * config/loongarch/simd.md (even_odd): New define_int_attr. (vec_widen_mult__): New define_expand. gcc/test

Re: [PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description

2025-02-13 Thread Xi Ruoyao
n test the optimal > values > > for -malign-{functions,labels,jumps,loops} on that basis. Thanks! -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH v2 6/8] LoongArch: Simplify lsx_vpick description

2025-02-13 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates instead of hard-coded const vectors. This is not suitable for LASX where lasx_xvpick has a different semantic. gcc/ChangeLog: * config/loongarch/simd.md (LVEC): New define_mode_attr. (simdfmt_as_

[PATCH v2 5/8] LoongArch: Simplify {lsx_,lasx_x}vmaddw description

2025-02-13 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. Also reorder two operands of the outer plus in the template, so combine will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}. gcc/ChangeL

[PATCH v2 8/8] LoongArch: Implement [su]dot_prod* for LSX and LASX modes

2025-02-13 Thread Xi Ruoyao
Despite it's just a special case of "a widening product of which the result used for reduction," having these standard names allows to recognize the dot product pattern earlier and it may be beneficial to optimization. Also fix some test failures with the test cases: - gcc.dg/vect/vect-reduc-chai

[PATCH v2 3/8] LoongArch: Simplify {lsx_, lasx_x}v{add, sub, mul}l{ev, od} description

2025-02-13 Thread Xi Ruoyao
These pattern definitions are tediously long, invoking 32 UNSPECs and many hard-coded long const vectors. To simplify them, at first we use the TImode vector operations instead of the UNSPECs, then we adopt an approach in AArch64: using a special predicate to match the const vectors for odd/even i

[PATCH v2 2/8] LoongArch: Allow moving TImode vectors

2025-02-13 Thread Xi Ruoyao
We have some vector instructions for operations on 128-bit integer, i.e. TImode, vectors. Previously they had been modeled with unspecs, but it's more natural to just model them with TImode vector RTL expressions. For the preparation, allow moving V1TImode and V2TImode vectors in LSX and LASX reg

[PATCH v2 1/8] LoongArch: Try harder using vrepli instructions to materialize const vectors

2025-02-13 Thread Xi Ruoyao
For a = (v4si){0x, 0x, 0x, 0x} we just want vrepli.b $vr0, 0xdd but the compiler actually produces a load: la.local $r14,.LC0 vld $vr0,$r14,0 It's because we only tried vrepli.d which wouldn't work. Try all vrepli instructions for const int vector

[PATCH v2 4/8] LoongArch: Simplify {lsx_, lasx_x}vh{add, sub}w description

2025-02-13 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. gcc/ChangeLog: * config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove. (UNSPEC_LASX_XVHSUBW_Q_D): Remove. (UNSPEC_LASX

[PATCH v2 0/8] LoongArch: SIMD odd/even/horizontal widening arithmetic cleanup and optimization

2025-02-13 Thread Xi Ruoyao
is selected for the left operand of addsub. Swap the operands if needed when outputting the asm. - Fix typos in commit subjects. - Mention V2TI in loongarch-modes.def. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? Xi Ruoyao (8): LoongArch: Try harder using vrepli instructions

Re: [PATCH v2 3/4] LoongArch: After setting the compilation options, update the predefined macros.

2025-02-12 Thread Xi Ruoyao
On Thu, 2025-02-13 at 09:24 +0800, Lulu Cheng wrote: > > 在 2025/2/12 下午6:19, Xi Ruoyao 写道: > > On Wed, 2025-02-12 at 18:03 +0800, Lulu Cheng wrote: > > > > /* snip */ > > > > > diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-3.c > > > b

Re: [PATCH v2 3/4] LoongArch: After setting the compilation options, update the predefined macros.

2025-02-12 Thread Xi Ruoyao
oongarch/pr118828-4.c > @@ -0,0 +1,55 @@ > +/* { dg-do run } */ > +/* { dg-options "-mtune=la464" } */ > + > +#include > +#include > +#include > + > +#ifndef __loongarch_tune > +#error __loongarch_tune should not be available here Likewise. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description

2025-02-11 Thread Xi Ruoyao
On Tue, 2025-02-11 at 16:52 +0800, Lulu Cheng wrote: > > 在 2025/2/7 下午8:09, Xi Ruoyao 写道: > /* snip */ > > - > > -(define_insn "lasx_xvpickev_w" > > -  [(set (match_operand:V8SI 0 "register_operand" "=f") > > - (vec_select:V8S

Re: [PATCH 2/3] LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions.

2025-02-11 Thread Xi Ruoyao
_LSX) > -    { > -  builtin_define ("__loongarch_simd"); > -  builtin_define ("__loongarch_sx"); > - > -  if (!ISA_HAS_LASX) > - builtin_define ("__loongarch_simd_width=128"); > -    } > - > -  if (ISA_HAS_LASX) > -    { >

Re: [PATCH 5/8] LoongArch: Simplify {lsx_,lasx_x}maddw description

2025-02-11 Thread Xi Ruoyao
On Tue, 2025-02-11 at 15:49 +0800, Lulu Cheng wrote: > It seems that the title here is "{lsx_,lasx_x}vmaddw". Will fix in v2. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH 4/8] LoongArch: Simplify {lsx_,lasx_x}hv{add,sub}w description

2025-02-11 Thread Xi Ruoyao
On Tue, 2025-02-11 at 15:48 +0800, Lulu Cheng wrote: > Hi, > >   I think , the "{lsx_,lasx_x}hv{add,sub}w" in the title should be > "{lsx_,lasx_x}vh{add,sub}w". Indeed. > > 在 2025/2/7 下午8:09, Xi Ruoyao 写道: > > Like what we've done for {ls

[PATCH] LoongArch: Accept ADD, IOR or XOR when combining objects with no bits in common [PR115478]

2025-02-10 Thread Xi Ruoyao
Since r15-1120, multi-word shifts/rotates produces PLUS instead of IOR. It's generally a good thing (allowing to use our alsl instruction or similar instrunction on other architectures), but it's preventing us from using bytepick. For example, if we shift a __int128 by 16 bits, the higher word can

[PATCH 7/8] LoongArch: Implement vec_widen_mult_{even, odd}_* for LSX and LASX modes

2025-02-07 Thread Xi Ruoyao
Since PR116142 has been fixed, now we can add the standard names so the compiler will generate better code if the result of a widening production is reduced. gcc/ChangeLog: * config/loongarch/simd.md (even_odd): New define_int_attr. (vec_widen_mult__): New define_expand. gcc/test

[PATCH 5/8] LoongArch: Simplify {lsx_,lasx_x}maddw description

2025-02-07 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. Also reorder two operands of the outer plus in the template, so combine will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}. gcc/ChangeL

[PATCH 3/8] LoongArch: Simplify {lsx_, lasx_x}v{add, sub, mul}l{ev, od} description

2025-02-07 Thread Xi Ruoyao
These pattern definitions are tediously long, invoking 32 UNSPECs and many hard-coded long const vectors. To simplify them, at first we use the TImode vector operations instead of the UNSPECs, then we adopt an approach in AArch64: using a special predicate to match the const vectors for odd/even i

[PATCH 8/8] LoongArch: Implement [su]dot_prod* for LSX and LASX modes

2025-02-07 Thread Xi Ruoyao
Despite it's just a special case of "a widening product of which the result used for reduction," having these standard names allows to recognize the dot product pattern earlier and it may be beneficial to optimization. Also fix some test failures with the test cases: - gcc.dg/vect/vect-reduc-chai

[PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description

2025-02-07 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates instead of hard-coded const vectors. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvpickev_b): Remove. (lasx_xvpickev_h): Remove. (lasx_xvpickev_w): Remove. (lasx_xvpickev_w_f):

[PATCH 4/8] LoongArch: Simplify {lsx_, lasx_x}hv{add, sub}w description

2025-02-07 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. gcc/ChangeLog: * config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove. (UNSPEC_LASX_XVHSUBW_Q_D): Remove. (UNSPEC_LASX

[PATCH 1/8] LoongArch: Try harder using vrepli instructions to materialize const vectors

2025-02-07 Thread Xi Ruoyao
For a = (v4si){0x, 0x, 0x, 0x} we just want vrepli.b $vr0, 0xdd but the compiler actually produces a load: la.local $r14,.LC0 vld $vr0,$r14,0 It's because we only tried vrepli.d which wouldn't work. Try all vrepli instructions for const int vector

[PATCH 2/8] LoongArch: Allow moving TImode vectors

2025-02-07 Thread Xi Ruoyao
We have some vector instructions for operations on 128-bit integer, i.e. TImode, vectors. Previously they had been modeled with unspecs, but it's more natural to just model them with TImode vector RTL expressions. For the preparation, allow moving V1TImode and V2TImode vectors in LSX and LASX reg

[PATCH 0/8] LoongArch: SIMD odd/even/horizontal widening arithmetic cleanup and optimization

2025-02-07 Thread Xi Ruoyao
. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? Xi Ruoyao (8): LoongArch: Try harder using vrepli instructions to materialize const vectors LoongArch: Allow moving TImode vectors LoongArch: Simplify {lsx_,lasx_x}v{add,sub,mul}l{ev,od} description LoongArch: Simplify

  1   2   3   4   5   6   7   8   9   10   >