with a
> simple loop.
It's not if string.h is included. It's if string.h provides rawmemchr.
rawmemchr is not a standard C function. It's a GNU extension and GCC is
expected to work on various non-GNU systems.
--
Xi Ruoyao
When TARGET_VECTORIZE_VEC_PERM_CONST is called, target may be the
same pseudo as op0 and/or op1. Loading the selector into target
would clobber the input, producing wrong code like
vld $vr0, $t0
vshuf.w $vr0, $vr0, $vr1
So don't load the selector into d->target, use a new pseudo to h
On Fri, 2025-07-11 at 14:01 -0500, Peter Bergner wrote:
> On 7/11/25 10:22 AM, Vladimir Makarov wrote:
> > On 7/8/25 9:43 PM, Xi Ruoyao wrote:
> > >
> > > IIUC "recog does not look at constraints until reload" has been a
> > > well-established rule
The PR 87600 fix has disallowed reloading user hard registers to resolve
earlyclobber-induced conflict.
However before reload, recog completely ignores the constraints of
insns, so the RTL passes may produce insns where some user hard
registers violate an earlyclobber. Then we'll get an ICE witho
I'm going to refine a part of the PR 87600 fix which seems triggering
PR 120983 that LoongArch is particularly suffering. Enable the PR 87600
tests so I'll not regress PR 87600.
gcc/testsuite/ChangeLog:
PR rtl-optimization/87600
PR rtl-optimization/120983
* gcc.dg/pr87600
Bootstrapped and regtested on aarch64-linux-gnu, loongarch64-linux-gnu,
and x86_64-linux-gnu. Ok for trunk?
Xi Ruoyao (2):
testsuite: Enable the PR 87600 tests for LoongArch
lra: Reallow reloading user hard registers if the insn is not asm [PR
120983]
gcc/lra-constraints.cc
On Sat, 2025-07-05 at 14:10 -0500, Segher Boessenkool wrote:
> Hi!
>
> On Sat, Jul 05, 2025 at 11:10:05PM +0800, Xi Ruoyao wrote:
> > Possibly this is https://gcc.gnu.org/PR101882. Specifically comment 5
> > from Segher:
> >
> > "The LRA change is corre
On Sat, 2025-07-05 at 17:55 +0800, Xi Ruoyao wrote:
> On Sat, 2025-07-05 at 11:20 +0800, Lulu Cheng wrote:
> > For the gcc.target/loongarch/bitwise-shift-reassoc-clobber.c,
> > some extensions are eliminated in ext_dce in commit r16-1835.
> >
> > This will result
"register_operand" "r"]
> "TARGET_64BIT
> && loongarch_reassoc_shift_bitwise (, operands[2], operands[3],
> - SImode)"
> + SImode)
> + && !(GP_REG_P (REGNO (operands[0]))
> + && REGNO (operands[0]) == REGNO (operands[4]))"
> "#"
> "&& reload_completed"
> [; r0 = r1 [&|^] r3 is emitted in PREPARATION-STATEMENTS because we
--
Xi Ruoyao
On Fri, 2025-07-04 at 11:14 +0800, Xi Ruoyao wrote:
> On Fri, 2025-07-04 at 09:47 +0800, Lulu Cheng wrote:
> >
> > 在 2025/7/2 下午3:31, Xi Ruoyao 写道:
> > > The register_operand predicate can match subreg, then we'd have a
> > > subreg
> > > of subreg
On Fri, 2025-07-04 at 09:47 +0800, Lulu Cheng wrote:
>
> 在 2025/7/2 下午3:31, Xi Ruoyao 写道:
> > The register_operand predicate can match subreg, then we'd have a subreg
> > of subreg and it's invalid. Use lowpart_subreg to avoid the nested
> >
In GCC 16 the compiler is smarter and it optimizes away the unneeded
zero-extension during the expand pass. Thus we can no longer match
and_alsl_reversed.
Drop the scan-rtl-dump for and_alsl_reversed and add scan-assembler-not
against bstrpick.d to detect the unneeded zero-extension in case it
re
The register_operand predicate can match subreg, then we'd have a subreg
of subreg and it's invalid. Use lowpart_subreg to avoid the nested
subreg.
gcc/ChangeLog:
* config/loongarch/loongarch.md (crc_combine): Avoid nested
subreg.
gcc/testsuite/ChangeLog:
* gcc.c-tortu
load,store,mgtf,fpload,mftg,fpstore")
> + (set_attr "mode" "SI")])
> +
> +(define_insn_and_split "*movsi_internal_la32"
> + [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,m,*f,f,*r,*m")
> + (match_operand:SI 1 "move_operand" "r,Yd,m,rJ,*r*J,m,*f,*f"))]
> + "TARGET_32BIT
> + && (register_operand (operands[0], SImode)
> + || reg_or_0_operand (operands[1], SImode))"
> { return loongarch_output_move (operands); }
> "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO
> (operands[0]))"
--
Xi Ruoyao School of Aerospace Science and
Technology, Xidian University
(op, 0), mode)")))
> + (match_test "TARGET_64BIT && loongarch_base_index_address_p (XEXP
> (op, 0), mode)")))
IMO it's more natural to do
(and (match_code "mem")
(match_test "TARGET_64BIT")
(match_test "loongarch_b
t; +}
> + [(set (attr "length") (const_int 20))])
>
> (define_insn "atomic_exchange"
> [(set (match_operand:GPR 0 "register_operand" "=&r")
> @@ -217,10 +244,21 @@ (define_insn "atomic_exchange"
> (match_operand:
n the lines below).
> +#define ABI_BASE_ILP32F 1
> +#define ABI_BASE_ILP32S 2
> +#define ABI_BASE_LP64D 3
> +#define ABI_BASE_LP64F 4
> +#define ABI_BASE_LP64S 5
> +#define N_ABI_BASE_TYPES 6
>
> extern loongarch_def_array
> loongarch_abi_base_strings;
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
ns "-march=loongarch32 -mabi=ilp32d -O2" } */
> +long long foo(long long *arr, long long index)
> +{
> + return arr[index];
> +}
> \ No newline at end of file
Please don't leave files with no newline at end.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
+ if (IMM12_OPERAND (offset)
> + || (TARGET_64BIT && (offset < 32768)))
> bitmap_set_bit (components, regno);
>
> offset -= UNITS_PER_WORD;
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
)
IIRC this change is caused by a downstream autoconf patch used by some
distro. Thus we need to regenerate the configure script with the
vanilla autoconf-2.69 to avoid the unrelated change.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Wed, 2025-05-28 at 18:17 +0100, Richard Sandiford wrote:
> Sorry for the slow reply, had a few days off.
>
> Xi Ruoyao writes:
> > If we see a promoted subreg and TRULY_NOOP_TRUNCATION says the
> > truncation is not a noop, then all bits of the inner reg are live. We
zimop_zkn_zknd_zkne_zknh_zksed_zksh_zkt_zvbb_zvfh_"
> + "zvfhmin_zvkt_zvl128b_zvl32b_zvl64b",
IIUC zvl128b implies zvl32b and zvl64b, then should we explicitly give
zvl32b and zvl64b here?
> + "xiangshan-kunminghu")
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
I forgot to send this to the list :(.
Forwarded Message
From: Xi Ruoyao
To: Alexandre Oliva
Cc: Xi Ruoyao
Subject: [PATCH] testsuite: Fix up dg-do-if
Date: 05/26/25 17:59:32
The line number needs to be passed to dg-do, instead of being stripped.
Fixes 'compile: syntax
If we see a promoted subreg and TRULY_NOOP_TRUNCATION says the
truncation is not a noop, then all bits of the inner reg are live. We
cannot reduce the live mask to that of the mode of the subreg.
gcc/ChangeLog:
PR rtl-optimization/120050
* ext-dce.cc (ext_dce_process_uses): Break
} }
> */
> /* { dg-additional-options "--param max-completely-peeled-insns=200"
> { target powerpc64*-*-* } } */
> +/* { dg-additional-options "-mlsx" { target loongarch64-*-* } } */
>
> typedef unsigned short ggml_fp16_t;
> static float table_f32_f16[1 << 16];
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
upport for RISC-V TLSDESC?
I don't think it's accurate. The RISC-V TLSDESC support is just not
merged into Glibc yet.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
The kernel developers have requested such a constraint to use csrxchg
in inline assembly.
gcc/ChangeLog:
* doc/md.texi: Document the 'q' constraint for LoongArch.
---
Ok for trunk?
gcc/doc/md.texi | 3 +++
1 file changed, 3 insertions(+)
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
On Tue, 2025-05-20 at 13:06 +0100, Jonathan Wakely wrote:
> On 13/07/20 16:45 +0800, Xi Ruoyao via Libstdc++ wrote:
> >
> > > The second and third patch together resolve PR 81806.
> >
> > The attached patch modifies split_finish to use the subtree size we
> >
quired by the ISA spec.
I'm trying to fix an ext-dce bug regarding !TARGET_TRULY_NOOP_TRUNCATION
so I just decided to chime in and explain this :).
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Mon, 2025-05-12 at 12:59 +0100, Richard Sandiford wrote:
> Xi Ruoyao writes:
> > The tranform would be unsafe if !TRULY_NOOP_TRUNCATION because on these
> > machines the hardware may look at bits outside of the given mode.
> >
> > gcc/ChangeLog:
> >
&
The tranform would be unsafe if !TRULY_NOOP_TRUNCATION because on these
machines the hardware may look at bits outside of the given mode.
gcc/ChangeLog:
PR rtl-optimization/120050
* ext-dce.cc (ext_dce_try_optimize_insn): Only transform the
insn if TRULY_NOOP_TRUNCATION.
-
bination: %s", "-mmicromips -mmsa");
And should this line be updated too like "-mmicromips -mmsa is only
supported for MIPSr6"?
Unfortunately the original patch is already applied and breaking even a
non-bootstrapping build for MIPS. Thus a fix is needed ASAP or we'd
revert the original patch.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Thu, 2025-04-03 at 10:13 +0800, Lulu Cheng wrote:
>
> 在 2025/4/2 上午11:19, Xi Ruoyao 写道:
> > Avoid using gensub that FreeBSD awk lacks, use gsub and split those
> > each
> > of gawk, mawk, and FreeBSD awk provides.
> >
> > Reported-by: mp...@vip.163.com
&
Avoid using gensub that FreeBSD awk lacks, use gsub and split those each
of gawk, mawk, and FreeBSD awk provides.
Reported-by: mp...@vip.163.com
Link: https://man.freebsd.org/cgi/man.cgi?query=awk
gcc/ChangeLog:
* config/loongarch/genopts/gen-evolution.awk: Avoid using gensub
tha
From: Denis Chertykov
Test file: udivmoddi.c
problem insn: 484
Before LRA pass we have:
(insn 484 483 485 72 (parallel [
(set (reg/v:SI 143 [ __q1 ])
(plus:SI (reg/v:SI 143 [ __q1 ])
(const_int -2 [0xfffe])))
(clobber (scrat
We already allow the ABI names for GPR in inline asm clobber list, so
for consistency allow the ABI names for FPR as well.
Reported-by: Yao Zi
gcc/ChangeLog:
* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
fa0-fa7, ft0-ft16, and fs0-fs7.
gcc/testsuite/ChangeLog:
Structured binding is a C++17 feature but the GCC code base is in C++14.
gcc/ChangeLog:
PR target/119238
* config/loongarch/simd.md (dot_prod):
Stop using structured binding.
---
Ok for trunk?
gcc/config/loongarch/simd.md | 14 --
1 file changed, 8 insertion
When we call loongarch_reassoc_shift_bitwise for
_alsl_reversesi_extend, the mask is in DImode but we are trying
to operate it in SImode, causing an ICE.
To fix the issue sign-extend the mask into the mode we want. And also
specially handle the case the mask is extended into -1 to avoid a
miss-op
On Wed, 2025-03-05 at 10:52 +0800, Lulu Cheng wrote:
> LGTM!
Pushed to trunk. The draft of gcc-14 backport is attached, I'll push it
if it builds & tests fine and there's no objection.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidia
We can just shift the mask and fill the other bits with 0 (for ior/xor)
or 1 (for and), and use an am*.w instruction to perform the atomic
operation, instead of using a LL-SC loop.
gcc/ChangeLog:
* config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AND):
Remove.
(UNSPEC_COM
We'll use the sc.q instruction for some 16-byte atomic operations, but
it's only added in LoongArch 1.1 evolution so we need to gate it with
an option.
gcc/ChangeLog:
* config/loongarch/genopts/isa-evolution.in (scq): New evolution
feature.
* config/loongarch/loongarch-evo
They could be incorrectly reordered with store instructions like st.b
because the RTL expression does not have a memory_operand or a (mem)
expression. The incorrect reorder has been observed in openh264 LTO
build.
Expand them to a (mem) expression instead of unspec to fix the issue.
Then we need
They are the same.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_optab): Remove.
(atomic_): Change atomic_optab to amop.
(atomic_fetch_): Likewise.
---
gcc/config/loongarch/sync.md | 6 ++
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/gcc/config/lo
For LL-SC loops, if the atomic operation has succeeded, the SC
instruction always imply a full barrier, so the barrier we manually
inserted only needs to take the account for the failure memorder, not
the success memorder (the barrier is skipped with "b 3f" on success
anyway).
Note that if we use
gcc/ChangeLog:
* config/loongarch/sync.md (UNSPEC_TI_FETCH_ADD): New unspec.
(UNSPEC_TI_FETCH_SUB): Likewise.
(UNSPEC_TI_FETCH_AND): Likewise.
(UNSPEC_TI_FETCH_XOR): Likewise.
(UNSPEC_TI_FETCH_OR): Likewise.
(UNSPEC_TI_FETCH_NAND_MASK_INVERTED): Like
If the vector is naturally aligned, it cannot cross cache lines so the
LSX load is guaranteed to be atomic. Thus we can use LSX to do the
lock-free atomic load, instead of using a lock.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_loadti_lsx): New define_insn.
(atomic_loadti
Atomic load does not modify the memory. Atomic store does not read the
memory, thus we can use "=" instead.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_load): Remove "+" for
the memory operand.
(atomic_store): Use "=" instead of "+" for the memory
operand.
-
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_compare_and_swapti_scq): New
define_insn.
(atomic_compare_and_swapti): New define_expand.
---
gcc/config/loongarch/sync.md | 89
1 file changed, 89 insertions(+)
diff --git a/gcc/config
Without atomic_fetch_nandsi and atomic_fetch_nanddi, __atomic_fetch_nand
is expanded to a loop containing a CAS in the body, and CAS itself is a
LL-SC loop so we have a nested loop. This is obviously not a good idea
as we just need one LL-SC loop in fact.
As ~(atom & mask) is (~mask) | (~atom), w
This instruction is used to skip an redundant barrier if -mno-ld-seq-sa
or the memory model requires a barrier on failure. But with -mld-seq-sa
and other memory models the barrier may be nonexisting at all, and we
should remove the "b 3f" instruction as well.
The implementation uses a new operand
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_exchangeti_scq): New
define_insn.
(atomic_exchangeti): New define_expand.
---
gcc/config/loongarch/sync.md | 35 +++
1 file changed, 35 insertions(+)
diff --git a/gcc/config/loongarch/sync.m
With -mlam-bh, we should negate the addend first, and use an amadd
instruction. Disabling the expander makes the compiler do it correctly.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_fetch_sub):
Disable if ISA_HAS_LAM_BH.
---
gcc/config/loongarch/sync.md | 2 +-
1 file cha
When LSX is not available but sc.q is (for example on LA664 where the
SIMD unit is not enabled), we can use a LL-SC loop for 16-byte atomic
store.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Accept "%t" for printing the number of the 64-bit mach
If the vector is naturally aligned, it cannot cross cache lines so the
LSX store is guaranteed to be atomic. Thus we can use LSX to do the
lock-free atomic store, instead of using a lock.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_storeti_lsx): New
define_insn.
(at
On LoongArch sll.w and srl.w instructions only take the [4:0] bits of
rk (shift amount) into account, and we've already defined
SHIFT_COUNT_TRUNCATED to 1 so the compiler knows this fact, thus we
don't need this instruction.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_test_and_set):
They are the same.
gcc/ChangeLog:
* config/loongarch/sync.md: Use instead of .
(amo): Remove.
---
gcc/config/loongarch/sync.md | 53 +---
1 file changed, 25 insertions(+), 28 deletions(-)
diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loo
We can use bstrins for masking the address here. As people are already
working on LA32R (which lacks bstrins instructions), for future-proofing
we check whether (const_int -4) is an and_operand and force it into an
register if not.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_test_a
The entire patch bootstrapped and regtested on loongarch64-linux-gnu
with -march=la664, and I've also tried several simple 16-byte atomic
operation tests locally.
OK for trunk? Or maybe the clean up is OK but the 16-byte atomic
implementation still needs to be confirmed by the hardware team
We've implemented the slli + bitwise => bitwise + slli reassociation in
r15-7062. I'd hoped late combine could handle slli.d + bitwise + add.d
=> bitwise + slli.d + add.d => bitwise => alsl.d, but it does not always
work, for example
a |= 0xfff;
b |= 0xfff;
a <<= 2;
b <<= 2;
a += x;
b
On Tue, 2025-02-25 at 20:49 +0800, Lulu Cheng wrote:
>
> 在 2025/2/22 下午3:34, Xi Ruoyao 写道:
> > Now for __builtin_popcountl we are getting things like
> >
> > vrepli.b$vr0,0
> > vinsgr2vr.d $vr0,$r4,0
> > vpcnt.d $vr0,$vr0
> >
Now for __builtin_popcountl we are getting things like
vrepli.b$vr0,0
vinsgr2vr.d $vr0,$r4,0
vpcnt.d $vr0,$vr0
vpickve2gr.du $r4,$vr0,0
slli.w $r4,$r4,0
jr $r1
The "vrepli.b" instruction is introduced by the init-regs pass (see
PR618
assumptions about the rounding modes in
> floating-point
> calculations, such as in float_extend, which may prevent CSE optimizations.
> Could
> this also lead to lost optimization opportunities in other areas that don't
> require
> this option? I'm not sure.
>
> I suspect that the best approach would be to define relevant
> attributes (perhaps similar to -frounding-math) within specific related
> patterns/built-ins
> to inform optimizers we are using a rounding mode and to avoid
> over-optimization.
The "special pattern" is supposed to be #pragma STDC FENV_ACCESS that
we've not implemented. See https://gcc.gnu.org/PR34678.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Wed, 2025-02-05 at 08:57 +0800, Xi Ruoyao wrote:
> Like RISC-V, on LoongArch we don't really support %cN for SYMBOL_REFs
> even with -fno-pic.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/toplevel-asm-1.c: Use %cc3 %cc4 instead of %c3
> %c4 on LoongArc
Allowing (t + (1ul << imm >> 1)) >> imm to be recognized as a rounding
shift operation.
gcc/ChangeLog:
* config/loongarch/lasx.md (UNSPEC_LASX_XVSRARI): Remove.
(UNSPEC_LASX_XVSRLRI): Remove.
(lasx_xvsrari_): Remove.
(lasx_xvsrlri_): Remove.
* config/loonga
On Fri, 2025-02-14 at 15:46 +0800, Lulu Cheng wrote:
> Hi,
>
> If only apply the first and second patches, the code will not compile.
>
> Otherwise LGTM.
Fixed in v3:
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675776.html
--
Xi Ruoyao
School of Aerospace Science
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.
Also reorder two operands of the outer plus in the template, so combine
will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}.
gcc/ChangeL
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.
gcc/ChangeLog:
* config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove.
(UNSPEC_LASX_XVHSUBW_Q_D): Remove.
(UNSPEC_LASX
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates instead of hard-coded const vectors.
This is not suitable for LASX where lasx_xvpick has a different
semantic.
gcc/ChangeLog:
* config/loongarch/simd.md (LVEC): New define_mode_attr.
(simdfmt_as_
Despite it's just a special case of "a widening product of which the
result used for reduction," having these standard names allows to
recognize the dot product pattern earlier and it may be beneficial to
optimization. Also fix some test failures with the test cases:
- gcc.dg/vect/vect-reduc-chai
Since PR116142 has been fixed, now we can add the standard names so the
compiler will generate better code if the result of a widening
production is reduced.
gcc/ChangeLog:
* config/loongarch/simd.md (even_odd): New define_int_attr.
(vec_widen_mult__): New define_expand.
gcc/test
For
a = (v4si){0x, 0x, 0x, 0x}
we just want
vrepli.b $vr0, 0xdd
but the compiler actually produces a load:
la.local $r14,.LC0
vld $vr0,$r14,0
It's because we only tried vrepli.d which wouldn't work. Try all vrepli
instructions for const int vector
We have some vector instructions for operations on 128-bit integer, i.e.
TImode, vectors. Previously they had been modeled with unspecs, but
it's more natural to just model them with TImode vector RTL expressions.
For the preparation, allow moving V1TImode and V2TImode vectors in LSX
and LASX reg
These pattern definitions are tediously long, invoking 32 UNSPECs and
many hard-coded long const vectors. To simplify them, at first we use
the TImode vector operations instead of the UNSPECs, then we adopt an
approach in AArch64: using a special predicate to match the const
vectors for odd/even i
tested on loongarch64-linux-gnu, no new code
change in v3. Ok for trunk?
Xi Ruoyao (8):
LoongArch: Try harder using vrepli instructions to materialize const
vectors
LoongArch: Allow moving TImode vectors
LoongArch: Simplify {lsx_,lasx_x}v{add,sub,mul}l{ev,od} description
LoongArch: Si
Since PR116142 has been fixed, now we can add the standard names so the
compiler will generate better code if the result of a widening
production is reduced.
gcc/ChangeLog:
* config/loongarch/simd.md (even_odd): New define_int_attr.
(vec_widen_mult__): New define_expand.
gcc/test
n test the optimal
> values
>
> for -malign-{functions,labels,jumps,loops} on that basis.
Thanks!
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates instead of hard-coded const vectors.
This is not suitable for LASX where lasx_xvpick has a different
semantic.
gcc/ChangeLog:
* config/loongarch/simd.md (LVEC): New define_mode_attr.
(simdfmt_as_
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.
Also reorder two operands of the outer plus in the template, so combine
will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}.
gcc/ChangeL
Despite it's just a special case of "a widening product of which the
result used for reduction," having these standard names allows to
recognize the dot product pattern earlier and it may be beneficial to
optimization. Also fix some test failures with the test cases:
- gcc.dg/vect/vect-reduc-chai
These pattern definitions are tediously long, invoking 32 UNSPECs and
many hard-coded long const vectors. To simplify them, at first we use
the TImode vector operations instead of the UNSPECs, then we adopt an
approach in AArch64: using a special predicate to match the const
vectors for odd/even i
We have some vector instructions for operations on 128-bit integer, i.e.
TImode, vectors. Previously they had been modeled with unspecs, but
it's more natural to just model them with TImode vector RTL expressions.
For the preparation, allow moving V1TImode and V2TImode vectors in LSX
and LASX reg
For
a = (v4si){0x, 0x, 0x, 0x}
we just want
vrepli.b $vr0, 0xdd
but the compiler actually produces a load:
la.local $r14,.LC0
vld $vr0,$r14,0
It's because we only tried vrepli.d which wouldn't work. Try all vrepli
instructions for const int vector
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.
gcc/ChangeLog:
* config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove.
(UNSPEC_LASX_XVHSUBW_Q_D): Remove.
(UNSPEC_LASX
is selected for the left operand of addsub. Swap the operands if
needed when outputting the asm.
- Fix typos in commit subjects.
- Mention V2TI in loongarch-modes.def.
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk?
Xi Ruoyao (8):
LoongArch: Try harder using vrepli instructions
On Thu, 2025-02-13 at 09:24 +0800, Lulu Cheng wrote:
>
> 在 2025/2/12 下午6:19, Xi Ruoyao 写道:
> > On Wed, 2025-02-12 at 18:03 +0800, Lulu Cheng wrote:
> >
> > /* snip */
> >
> > > diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-3.c
> > > b
oongarch/pr118828-4.c
> @@ -0,0 +1,55 @@
> +/* { dg-do run } */
> +/* { dg-options "-mtune=la464" } */
> +
> +#include
> +#include
> +#include
> +
> +#ifndef __loongarch_tune
> +#error __loongarch_tune should not be available here
Likewise.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Tue, 2025-02-11 at 16:52 +0800, Lulu Cheng wrote:
>
> 在 2025/2/7 下午8:09, Xi Ruoyao 写道:
> /* snip */
> > -
> > -(define_insn "lasx_xvpickev_w"
> > - [(set (match_operand:V8SI 0 "register_operand" "=f")
> > - (vec_select:V8S
_LSX)
> - {
> - builtin_define ("__loongarch_simd");
> - builtin_define ("__loongarch_sx");
> -
> - if (!ISA_HAS_LASX)
> - builtin_define ("__loongarch_simd_width=128");
> - }
> -
> - if (ISA_HAS_LASX)
> - {
>
On Tue, 2025-02-11 at 15:49 +0800, Lulu Cheng wrote:
> It seems that the title here is "{lsx_,lasx_x}vmaddw".
Will fix in v2.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Tue, 2025-02-11 at 15:48 +0800, Lulu Cheng wrote:
> Hi,
>
> I think , the "{lsx_,lasx_x}hv{add,sub}w" in the title should be
> "{lsx_,lasx_x}vh{add,sub}w".
Indeed.
>
> 在 2025/2/7 下午8:09, Xi Ruoyao 写道:
> > Like what we've done for {ls
Since r15-1120, multi-word shifts/rotates produces PLUS instead of IOR.
It's generally a good thing (allowing to use our alsl instruction or
similar instrunction on other architectures), but it's preventing us
from using bytepick. For example, if we shift a __int128 by 16 bits,
the higher word can
Since PR116142 has been fixed, now we can add the standard names so the
compiler will generate better code if the result of a widening
production is reduced.
gcc/ChangeLog:
* config/loongarch/simd.md (even_odd): New define_int_attr.
(vec_widen_mult__): New define_expand.
gcc/test
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.
Also reorder two operands of the outer plus in the template, so combine
will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}.
gcc/ChangeL
These pattern definitions are tediously long, invoking 32 UNSPECs and
many hard-coded long const vectors. To simplify them, at first we use
the TImode vector operations instead of the UNSPECs, then we adopt an
approach in AArch64: using a special predicate to match the const
vectors for odd/even i
Despite it's just a special case of "a widening product of which the
result used for reduction," having these standard names allows to
recognize the dot product pattern earlier and it may be beneficial to
optimization. Also fix some test failures with the test cases:
- gcc.dg/vect/vect-reduc-chai
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates instead of hard-coded const vectors.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_xvpickev_b): Remove.
(lasx_xvpickev_h): Remove.
(lasx_xvpickev_w): Remove.
(lasx_xvpickev_w_f):
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.
gcc/ChangeLog:
* config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove.
(UNSPEC_LASX_XVHSUBW_Q_D): Remove.
(UNSPEC_LASX
For
a = (v4si){0x, 0x, 0x, 0x}
we just want
vrepli.b $vr0, 0xdd
but the compiler actually produces a load:
la.local $r14,.LC0
vld $vr0,$r14,0
It's because we only tried vrepli.d which wouldn't work. Try all vrepli
instructions for const int vector
We have some vector instructions for operations on 128-bit integer, i.e.
TImode, vectors. Previously they had been modeled with unspecs, but
it's more natural to just model them with TImode vector RTL expressions.
For the preparation, allow moving V1TImode and V2TImode vectors in LSX
and LASX reg
.
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk?
Xi Ruoyao (8):
LoongArch: Try harder using vrepli instructions to materialize const
vectors
LoongArch: Allow moving TImode vectors
LoongArch: Simplify {lsx_,lasx_x}v{add,sub,mul}l{ev,od} description
LoongArch: Simplify
1 - 100 of 1168 matches
Mail list logo