在 2025/1/22 下午5:21, Xi Ruoyao 写道:
On Wed, 2025-01-22 at 10:53 +0800, Xi Ruoyao wrote:
On Wed, 2025-01-22 at 10:37 +0800, Lulu Cheng wrote:
在 2025/1/22 上午8:49, Xi Ruoyao 写道:
The second source register of this insn cannot be the same as the
destination register.
gcc/ChangeLog
在 2025/1/18 下午7:33, Xi Ruoyao 写道:
/* snip */
;; This code iterator allows unsigned and signed division to be generated
;; from the same template.
@@ -3083,39 +3084,6 @@ (define_expand "rotl3"
}
});
-;; The following templates were added to generate "bstrpick.d + alsl.d"
-;;
在 2025/1/21 下午12:59, Xi Ruoyao 写道:
On Tue, 2025-01-21 at 11:46 +0800, Lulu Cheng wrote:
在 2025/1/18 下午7:33, Xi Ruoyao 写道:
/* snip */
;; This code iterator allows unsigned and signed division to be generated
;; from the same template.
@@ -3083,39 +3084,6 @@ (define_expand "
PR target/118561
gcc/ChangeLog:
* config/loongarch/loongarch-builtins.cc
(loongarch_expand_builtin_lsx_test_branch):
NULL_RTX will not be returned when an error is detected.
(loongarch_expand_builtin): Likewise.
gcc/testsuite/ChangeLog:
* gcc.targ
在 2025/1/23 上午11:36, Xi Ruoyao 写道:
On Thu, 2025-01-23 at 11:21 +0800, Lulu Cheng wrote:
在 2025/1/22 下午9:26, Xi Ruoyao 写道:
The test case added in r15-7073 now triggers an ICE, indicating we need
the same fix as AArch64.
gcc/ChangeLog:
PR target/118501
* config/loongarch
在 2025/1/22 下午9:26, Xi Ruoyao 写道:
The test case added in r15-7073 now triggers an ICE, indicating we need
the same fix as AArch64.
gcc/ChangeLog:
PR target/118501
* config/loongarch/loongarch.md (@xorsign3): Use
force_lowpart_subreg.
---
Bootstrapped and regtested on
在 2025/1/24 下午3:58, Richard Sandiford 写道:
Lulu Cheng writes:
在 2025/1/22 上午8:49, Xi Ruoyao 写道:
The second source register of this insn cannot be the same as the
destination register.
gcc/ChangeLog:
* config/loongarch/loongarch.md
(_alsl_reversesi_extended): Add '&
LGTM!
Thanks!
在 2025/1/18 下午7:33, Xi Ruoyao 写道:
For bstrins, we can merge it into and3 instead of having a
separate define_insn.
For bstrpick, we can use the constraints to ensure the first source
register and the destination register are the same hardware register,
instead of emitting a move
n (struct gcc_options *opts,
case OPT_mlasx:
opts->x_la_opt_simd = val ? ISA_EXT_SIMD_LASX
- : (la_opt_simd == ISA_EXT_SIMD_LSX || la_opt_simd == ISA_EXT_SIMD_LSX
+ : (la_opt_simd == ISA_EXT_SIMD_LASX || la_opt_simd == ISA_EXT_SIMD_LSX
2. Add example to doc.
Lulu Cheng (2):
Lo
Add function attributes support for LoongArch.
Currently, the following items are supported:
__attribute__ ((target ("{no-}strict-align")))
__attribute__ ((target ("cmodel=")))
__attribute__ ((target ("arch=")))
__attribute__ ((target ("tune=")))
__attribut
The target pragmas defined correspond to the target function attributes.
This implementation is derived from AArch64.
gcc/ChangeLog:
* config/loongarch/loongarch-protos.h
(loongarch_reset_previous_fndecl): Add function declaration.
(loongarch_save_restore_target_globals)
Pushed to r15-7092 and r15-7093.
在 2025/1/20 下午5:54, Lulu Cheng 写道:
Currently, the following items are supported:
__attribute__ ((target ("{no-}strict-align")))
__attribute__ ((target ("cmodel=")))
__attribute__ ((target ("arch=")))
在 2025/1/21 下午6:05, Xi Ruoyao 写道:
On Tue, 2025-01-21 at 16:41 +0800, Lulu Cheng wrote:
在 2025/1/21 下午12:59, Xi Ruoyao 写道:
On Tue, 2025-01-21 at 11:46 +0800, Lulu Cheng wrote:
在 2025/1/18 下午7:33, Xi Ruoyao 写道:
/* snip */
;; This code iterator allows unsigned and signed division to be
在 2025/1/22 上午8:49, Xi Ruoyao 写道:
The second source register of this insn cannot be the same as the
destination register.
gcc/ChangeLog:
* config/loongarch/loongarch.md
(_alsl_reversesi_extended): Add '&' to the destination
register constraint and append '0' to the fir
在 2025/1/21 下午4:41, Lulu Cheng 写道:
在 2025/1/21 下午12:59, Xi Ruoyao 写道:
On Tue, 2025-01-21 at 11:46 +0800, Lulu Cheng wrote:
在 2025/1/18 下午7:33, Xi Ruoyao 写道:
/* snip */
;; This code iterator allows unsigned and signed division to be
generated
;; from the same template.
@@ -3083,39
Pushed to r15-6432.
在 2024/12/17 上午10:41, Jiahao Xu 写道:
The hook changes the allocno class to either FP_REGS or GR_REGS depending on
the mode of the register. This results in better register allocation overall,
fewer spills and reduced codesize - particularly in SPEC2017 lbm.
gcc/ChangeLog:
LGTM!
Thanks!
在 2025/1/15 下午6:09, Xi Ruoyao 写道:
On 64-bit capable LoongArch hardware, alsl.wu is similar to alsl.w but
zero-extending the 32-bit result.
gcc/ChangeLog:
* config/loongarch/loongarch.md (alslsi3_extend): Add alsl.wu.
gcc/testsuite/ChangeLog:
* gcc.target/loonga
在 2025/1/16 下午8:59, Xi Ruoyao 写道:
On Thu, 2025-01-16 at 20:52 +0800, Xi Ruoyao wrote:
On Thu, 2025-01-16 at 20:30 +0800, Lulu Cheng wrote:
在 2025/1/15 下午6:10, Xi Ruoyao 写道:
diff --git a/gcc/config/loongarch/loongarch.cc
b/gcc/config/loongarch/loongarch.cc
index 9d97f0216f0..3a8e1297bd3
在 2025/1/15 下午6:10, Xi Ruoyao 写道:
diff --git a/gcc/config/loongarch/loongarch.cc
b/gcc/config/loongarch/loongarch.cc
index 9d97f0216f0..3a8e1297bd3 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3929,14 +3929,31 @@ loongarch_rtx_costs (rtx x, machine
在 2025/1/8 下午11:16, Xi Ruoyao 写道:
On Tue, 2025-01-07 at 10:44 +0800, Lulu Cheng wrote:
After changing this cost from 1 to 3, the performance of spec2006
401 473 416 465 482 can be improved by about 2% on LA664.
Would this fix https://gcc.gnu.org/PR114978 (or at least make it
latent)?
The
在 2025/1/24 下午7:44, Richard Sandiford 写道:
Lulu Cheng writes:
在 2025/1/24 下午3:58, Richard Sandiford 写道:
Lulu Cheng writes:
在 2025/1/22 上午8:49, Xi Ruoyao 写道:
I have no problem with this patch.
But, I have always been confused about the use of reload_completed.
I can understand that it
Hi,
If only apply the first and second patches, the code will not compile.
Otherwise LGTM.
Thanks!
在 2025/2/13 下午5:41, Xi Ruoyao 写道:
We have some vector instructions for operations on 128-bit integer, i.e.
TImode, vectors. Previously they had been modeled with unspecs, but
it's more natural
After changing this cost from 1 to 3, the performance of spec2006
401 473 416 465 482 can be improved by about 2% on LA664.
Add option '-maddr-reg-reg-cost='.
gcc/ChangeLog:
* config/loongarch/genopts/loongarch.opt.in: Add
option '-maddr-reg-reg-cost='.
* config/loongarch
Split the implementation of the function loongarch_cpu_cpp_builtins into two
parts:
1. Macro definitions that do not change (only considering 64-bit architecture)
2. Macro definitions that change with different compilation options.
gcc/ChangeLog:
* config/loongarch/loongarch-c.cc (bu
v1 -> v2:
1. Move __loongarch_{arch,tune} _LOONGARCH_{ARCH,TUNE}
__loongarch_{div32,am_bh,amcas,ld_seq_sa} and
__loongarch_version_major/__loongarch_version_minor to update function.
2. Fixed PR118843.
3. Add testsuites.
v2 -> v3:
1. Modify test cases (pr118828-3.c pr118828-4.c).
PR target/118828
gcc/ChangeLog:
* config/loongarch/loongarch-c.cc (loongarch_pragma_target_parse):
Update the predefined macros.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/pr118828.c: New test.
* gcc.target/loongarch/pr118828-2.c: New test.
*
PR target/118843
gcc/ChangeLog:
* config/loongarch/loongarch-c.cc
(loongarch_update_cpp_builtins): Fix macro definition issues.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/pr118843.c: New test.
---
gcc/config/loongarch/loongarch-c.cc | 27
gcc/ChangeLog:
* config/loongarch/loongarch-target-attr.cc
(loongarch_pragma_target_parse): Move to ...
(loongarch_register_pragmas): Move to ...
* config/loongarch/loongarch-c.cc
(loongarch_pragma_target_parse): ... here.
(loongarch_register_pragmas
LGTM!
Thanks!
在 2025/2/11 下午2:34, Xi Ruoyao 写道:
Since r15-1120, multi-word shifts/rotates produces PLUS instead of IOR.
It's generally a good thing (allowing to use our alsl instruction or
similar instrunction on other architectures), but it's preventing us
from using bytepick. For example, if
, Lulu Cheng wrote:
在 2025/2/7 下午8:09, Xi Ruoyao 写道:
/* snip */
-
-(define_insn "lasx_xvpickev_w"
- [(set (match_operand:V8SI 0 "register_operand" "=f")
- (vec_select:V8SI
- (vec_concat:V16SI
- (match_operand:V8SI 1 "register_operand"
Pushed to r15-7581.
在 2025/2/12 下午4:01, Lulu Cheng 写道:
Due to the presence of R_LARCH_B26 in
/usr/lib/gcc/loongarch64-linux-gnu/14/crtbeginS.o, its addressing
range is [PC-128MiB, PC+128MiB-4]. This means that when the code
segment size exceeds 128MB, linking with lld will definitely fail
(ld
在 2025/2/11 下午9:26, Xi Ruoyao 写道:
On Tue, 2025-02-11 at 20:49 +0800, Lulu Cheng wrote:
Split the implementation of the function loongarch_cpu_cpp_builtins
into two parts:
1. Macro definitions that do not change (only considering 64-bit
architecture)
2. Macro definitions that change with
在 2025/2/7 下午8:09, Xi Ruoyao 写道:
These pattern definitions are tediously long, invoking 32 UNSPECs and
many hard-coded long const vectors. To simplify them, at first we use
the TImode vector operations instead of the UNSPECs, then we adopt an
approach in AArch64: using a special predicate to m
在 2025/2/12 上午3:30, Xi Ruoyao 写道:
On Tue, 2025-02-11 at 16:52 +0800, Lulu Cheng wrote:
在 2025/2/7 下午8:09, Xi Ruoyao 写道:
/* snip */
-
-(define_insn "lasx_xvpickev_w"
- [(set (match_operand:V8SI 0 "register_operand" "=f")
- (vec_select:V8
Due to the presence of R_LARCH_B26 in
/usr/lib/gcc/loongarch64-linux-gnu/14/crtbeginS.o, its addressing
range is [PC-128MiB, PC+128MiB-4]. This means that when the code
segment size exceeds 128MB, linking with lld will definitely fail
(ld will not fail because the order of the two is different).
T
Pushed to r15-7521..r15-7524
在 2025/2/13 下午8:59, Lulu Cheng 写道:
v1 -> v2:
1. Move __loongarch_{arch,tune} _LOONGARCH_{ARCH,TUNE}
__loongarch_{div32,am_bh,amcas,ld_seq_sa} and
__loongarch_version_major/__loongarch_version_minor to update function.
2. Fixed PR118843.
3. Add testsuites.
Pushed to r15-7525.
在 2025/2/13 下午4:40, Lulu Cheng 写道:
After changing this cost from 1 to 3, the performance of spec2006
401 473 416 465 482 can be improved by about 2% on LA664.
Add option '-maddr-reg-reg-cost='.
gcc/ChangeLog:
* config/loongarch/genopts/loongarch.o
在 2025/2/12 下午6:19, Xi Ruoyao 写道:
On Wed, 2025-02-12 at 18:03 +0800, Lulu Cheng wrote:
/* snip */
diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-3.c
b/gcc/testsuite/gcc.target/loongarch/pr118828-3.c
new file mode 100644
index 000..a682ae4a356
--- /dev/null
+++ b/gcc
在 2025/2/22 下午3:34, Xi Ruoyao 写道:
Now for __builtin_popcountl we are getting things like
vrepli.b$vr0,0
vinsgr2vr.d $vr0,$r4,0
vpcnt.d $vr0,$vr0
vpickve2gr.du $r4,$vr0,0
slli.w $r4,$r4,0
jr $r1
The "vrepli.b" instruction is intro
LGTM!
Thanks!
在 2025/2/14 下午9:37, Xi Ruoyao 写道:
Allowing (t + (1ul << imm >> 1)) >> imm to be recognized as a rounding
shift operation.
gcc/ChangeLog:
* config/loongarch/lasx.md (UNSPEC_LASX_XVSRARI): Remove.
(UNSPEC_LASX_XVSRLRI): Remove.
(lasx_xvsrari_): Remove.
在 2025/2/19 下午3:27, Xi Ruoyao 写道:
On Wed, 2025-02-05 at 08:57 +0800, Xi Ruoyao wrote:
Like RISC-V, on LoongArch we don't really support %cN for SYMBOL_REFs
even with -fno-pic.
gcc/testsuite/ChangeLog:
* c-c++-common/toplevel-asm-1.c: Use %cc3 %cc4 instead of %c3
%c4 on LoongA
在 2025/2/14 下午8:21, Xi Ruoyao 写道:
This series is intended to fix some test failures on
vect-reduc-chain-*.c by adding the [su]dot_prod* expand for LSX and LASX
vector modes. But the code base of the related instructions was not
readable, so clean it up first (using the approach learnt from AAr
LGTM!
Thanks.
在 2025/3/3 下午3:29, Xi Ruoyao 写道:
They could be incorrectly reordered with store instructions like st.b
because the RTL expression does not have a memory_operand or a (mem)
expression. The incorrect reorder has been observed in openh264 LTO
build.
Expand them to a (mem) expressio
在 2025/3/5 上午11:03, Xi Ruoyao 写道:
On Wed, 2025-03-05 at 10:52 +0800, Lulu Cheng wrote:
LGTM!
Pushed to trunk. The draft of gcc-14 backport is attached, I'll push it
if it builds & tests fine and there's no objection.
Thanks a lot.
在 2025/3/7 下午2:37, Lulu Cheng 写道:
在 2025/2/14 下午8:21, Xi Ruoyao 写道:
Despite it's just a special case of "a widening product of which the
result used for reduction," having these standard names allows to
recognize the dot product pattern earlier and it may be beneficial to
opti
The issue is the same as 12383255fe4e82c31f5e42c72a8fbcb1b5dea35d.
Neither is .REDUC_PLUS set for V2SImode on LoongArch, so add it
to the list of targets not expecting BB vectorization.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-77.c: Add loongarch*-*-* to the list
of expected
By default, vectorization is not enabled on LoongArch,
resulting in the failure of these two test cases.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr112325.c: Add the vector compilation
option '-mlsx' for LoongArch.
* gcc.dg/vect/pr117888-1.c: Likewise.
---
gcc/testsuite/gc
在 2025/2/14 下午8:21, Xi Ruoyao 写道:
Despite it's just a special case of "a widening product of which the
result used for reduction," having these standard names allows to
recognize the dot product pattern earlier and it may be beneficial to
optimization. Also fix some test failures with the test
LGTM, but since we're now in stage 4, I believe it should be merged into
GCC16 Stage 1.
Thanks!
在 2025/3/1 上午11:38, Xi Ruoyao 写道:
We've implemented the slli + bitwise => bitwise + slli reassociation in
r15-7062. I'd hoped late combine could handle slli.d + bitwise + add.d
=> bitwise + slli.d
After d34cda720988674bcf8a24267c9e1ec61335d6de, what was originally
not vectorizable can now be vectorized. So adjust
gcc.dg/vect/slp-26.c.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-26.c: Adjust.
---
gcc/testsuite/gcc.dg/vect/slp-26.c | 4 ++--
1 file changed, 2 insertions(+), 2 delet
LGTM.
Thanks.
在 2025/3/10 下午2:40, Xi Ruoyao 写道:
When we call loongarch_reassoc_shift_bitwise for
_alsl_reversesi_extend, the mask is in DImode but we are trying
to operate it in SImode, causing an ICE.
To fix the issue sign-extend the mask into the mode we want. And also
specially handle the
.
Thanks.
在 2025/1/7 下午8:45, Zhou Zhao 写道:
在 2025/1/7 下午7:49, Lulu Cheng 写道:
在 2025/1/2 下午5:46, Zhou Zhao 写道:
If SImode reg is continuous left shifted twice, combine related
instruction to one.
gcc/ChangeLog:
* config/loongarch/loongarch.md (extsv_ashlsi3):
New template
Hi
401 - 452 of 452 matches
Mail list logo