---
gcc/ira-build.cc | 7 ---
gcc/ira-color.cc | 8
gcc/ira-emit.cc | 12 ++--
gcc/ira-lives.cc | 7 ---
gcc/ira.cc | 19 ---
5 files changed, 30 insertions(+), 23 deletions(-)
diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index ea593d5a087..2
---
gcc/Makefile.in | 1 +
gcc/df-problems.cc | 886 ++-
gcc/df.h | 159 +++
gcc/regs.h | 5 +
gcc/sbitmap.cc | 98 +
gcc/sbitmap.h| 2 +
gcc/subreg-live-range.cc | 233 ++
---
gcc/common.opt | 4
gcc/common.opt.urls | 3 +++
gcc/doc/invoke.texi | 8
gcc/opts.cc | 1 +
4 files changed, 16 insertions(+)
diff --git a/gcc/common.opt b/gcc/common.opt
index 40cab3cb36a..5710e817abe 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2163,6 +21
---
gcc/lra-coalesce.cc| 27 +++-
gcc/lra-constraints.cc | 109 ++---
gcc/lra-int.h | 4 +
gcc/lra-lives.cc | 357 -
gcc/lra-remat.cc | 8 +-
gcc/lra-spills.cc | 27 +++-
gcc/lra.cc | 10 +-
7 files ch
subreg liveness tracking in the
followup patches.
Bootstrap and Regtested on x86-64 no regression.
Co-authored-by: Lehua Ding
Juzhe-Zhong (4):
DF: Add -ftrack-subreg-liveness option
DF: Add DF_LIVE_SUBREG problem
IRA: Apply DF_LIVE_SUBREG data
LRA: Apply DF_LIVE_SUBREG data
gcc/M
This patch add a new DF problem, named DF_LIVE_SUBREG. This problem
is extended from the DF_LR problem and support track the subreg liveness
of multireg pseudo if these pseudo satisfy the following conditions:
1. the mode size greater than it's REGMODE_NATURAL_SIZE.
2. the reg is used in insns
x86-64 no regression.
Co-authored-by: Lehua Ding
Juzhe-Zhong (4):
DF: Add -ftrack-subreg-liveness option
DF: Add DF_LIVE_SUBREG problem
IRA: Add DF_LIVE_SUBREG problem
LRA: Apply DF_LIVE_SUBREG data
gcc/Makefile.in | 1 +
gcc/common.opt | 4 +
gcc/common.opt.
This patch apply the DF_LIVE_SUBREG to LRA pass. More changes were made
to the LRA than the IRA since the LRA will modify the DF data directly.
The main big changes are centered on the lra-lives.cc file.
Co-authored-by: Lehua Ding
gcc/ChangeLog:
* lra-coalesce.cc (update_live_info): App
This patch simple replace df_get_live_in to df_get_subreg_live_in
and replace df_get_live_out to df_get_subreg_live_out.
Co-authored-by: Lehua Ding
gcc/ChangeLog:
* ira-build.cc (create_bb_allocnos): Apply DF_LIVE_SUBREG data.
(create_loop_allocnos): Diito.
* ira-color.c
Add new flag -ftrack-subreg-liveness to enable track-subreg-liveness.
This flag is enabled at -O3/fast.
Co-authored-by: Lehua Ding
gcc/ChangeLog:
* common.opt: Add -ftrack-subreg-liveness option.
* common.opt.urls: Ditto.
* doc/invoke.texi: Ditto.
* opts.cc: Ditt
This patch fixes issue reported by Jeff.
Testing is running. Ok for trunk if I passed the testing with no regression ?
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): Fix inifinite
compilation.
(pre_vsetvl::remove_vsetvl_pre_insns): Ditto.
---
gcc/conf
This patch fixes the following:
vsetvli a5,a1,e32,m1,tu,ma
sllia4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
vsetivlizero,1,e32,m1,ta,ma
vmv.s.x v2,zero
vse
Realize in recent benchmark evaluation (coremark-pro zip-test):
vid.v v2
vmv.v.i v5,0
.L9:
vle16.v v3,0(a4)
vrsub.vxv4,v2,a6 ---> LICM failed to hoist it outside the
loop.
The root cause is:
(insn 56 47 57 4 (set (subreg:DI (reg:HI 220) 0)
(re
I realize there is a RTL regression between GCC-14 and GCC-13.
https://godbolt.org/z/Ga7K6MqaT
GCC-14:
(insn 9 13 31 2 (set (reg:DI 15 a5 [138])
(unspec:DI [
(const_int 64 [0x40])
] UNSPEC_VLMAX)) "/app/example.c":5:15 2566 {vlmax_avldi}
(expr_list:REG_EQUI
Update in v2: Add dump information.
This patch fixes the following ineffective vsetvl insertion:
#include "riscv_vector.h"
void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond,
size_t cond2)
{
for (size_t i = 0; i < n; i++)
{
if (i == cond) {
vint8mf8
This patch fixes the following ineffective vsetvl insertion:
#include "riscv_vector.h"
void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond,
size_t cond2)
{
for (size_t i = 0; i < n; i++)
{
if (i == cond) {
vint8mf8_t v = *(vint8mf8_t*)(in + i + 100);
This patch targets GCC-15.
Consider this following case:
unsigned int
single_loop_with_if_condition (unsigned int *restrict a, unsigned int *restrict
b,
unsigned int *restrict c, unsigned int loop_size)
{
unsigned int result = 0;
for (unsigned int i = 0; i < lo
Due to recent middle-end loop vectorizer changes, these tests have regression
and
the changes are reasonable. Adapt test to fix the regression.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Adapt test.
* gcc.target/riscv/rvv/autovec/binop/shift-rv
The compile time issue was discovered in SPEC 2017 wrf:
Use time and -ftime-report to analyze the profile data of SPEC 2017 wrf
compilation .
Before this patch (Lazy vsetvl):
scheduling : 121.89 ( 15%) 0.53 ( 11%) 122.72 ( 15%)
13M ( 1%)
machine dep reorg
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info):
Refine some codes.
(pre_vsetvl::emit_vsetvl): Ditto.
---
gcc/config/riscv/riscv-vsetvl.cc | 69 +---
1 file changed, 27 insertions(+), 42 deletions(-)
diff --git a
This patch fixes the recent noticed bug in RV32 glibc.
We incorrectly deleted a vsetvl:
...
and a4,a4,a3
vmv.v.i v1,0 ---> Missed vsetvl cause illegal
instruction report.
vse8.v v1,0(a5)
The root cause the laterin in LCM is incorrect.
This patch fixes the recent noticed bug in RV32 glibc.
We incorrectly deleted a vsetvl:
...
and a4,a4,a3
vmv.v.i v1,0 ---> Missed vsetvl cause illegal
instruction report.
vse8.v v1,0(a5)
The root cause the laterin in LCM is incorrect.
While looking into PR113469, I notice the LCM delete a vsetvl incorrectly.
This patch add dump information of all predecessors for LCM delete vsetvl block
for better debugging.
Tested no regression.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (get_all_predecessors): New function.
Notice full available is computed evey round of earliest fusion which is
redundant.
Actually we only need to compute it once in phase 3.
It's NFC patch and tested no regression. Committed.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_vsetvl_def_data):
Remove redu
This patch adds no fusion compile option to disable phase 2 global fusion.
It can help us to analyze the compile-time and debugging.
Committed.
gcc/ChangeLog:
* config/riscv/riscv-opts.h (enum vsetvl_strategy_enum): Add
optim-no-fusion option.
* config/riscv/riscv-vsetvl.cc (pa
SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS
that is, VSETVL PASS consume over 33 GB memory which make use impossible
to compile SPEC 2017 wrf in a laptop.
The root cause is wasting-memory variables:
unsigned num_exprs = num_bbs * num_regs;
sbitmap *avl_def_loc = sbitmap
This patch fixes the recent regression:
FAIL: gcc.dg/torture/float32-tg-2.c -O1 (internal compiler error: in
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c -O1 (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c -O2 (internal compiler error: in
reg_or_sub
Notice there is a AI benchmark, GCC vs Clang has 3% performance drop.
It's because Clang/LLVM has a simplification transform vmv.v.x (avl = 1) into
vmv.s.x.
Since vmv.s.x has more flexible vsetvl demand than vmv.v.x that can allow us to
have
better chances to fuse vsetvl.
Consider this followi
vfirst/vmsbf/vmsif/vmsof instructions are supposed to demand ratio instead of
demanding sew_lmul.
But my previous typo makes VSETVL PASS miss honor the risc-v v spec.
Consider this following simple case:
int foo4 (void * in, void * out)
{
vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
v = __r
../../gcc/config/riscv/riscv.cc: In function 'void
riscv_init_cumulative_args(CUMULATIVE_ARGS*, tree, rtx, tree, int)':
../../gcc/config/riscv/riscv.cc:4879:34: error: unused parameter 'fndecl'
[-Werror=unused-parameter]
4879 | tree fndecl,
|
This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by
RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode,
X0_REGNUM)
every time we call RVV_VLMAX, that is, we are always generating garbage and
redundant
(reg:DI 0 zero) rtx.
After this patch fix, the memo
This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by
RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode,
X0_REGNUM)
every time we call RVV_VLMAX, that is, we are always generating garbage and
redundant
(reg:DI 0 zero) rtx.
After this patch fix, the memo
While running various benchmarks, I notice we miss vi variant support for
integer comparison.
That is, we can vectorize code into vadd.vi but we can't vectorize into
vmseq.vi.
Consider this following case:
void
foo (int n, int **__restrict a)
{
int b;
int c;
int d;
for (b = 0; b < n; b+
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-subgroups-3.c: Add !vect128.
---
gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3
V3: Rebase to trunk and commit it.
This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible
check
for conflict vsetvl fusion.
Buggy assembler before this patch:
.L69:
vsetvli a5,s1,e8,mf4,ta,ma -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,
This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible
check
for conflict vsetvl fusion.
Buggy assembler before this patch:
.L69:
vsetvli a5,s1,e8,mf4,ta,ma -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8
This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible
check
for conflict vsetvl fusion.
Buggy assembler before this patch:
.L69:
vsetvli a5,s1,e8,mf4,ta,ma -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-43.c: Add vect128.
---
gcc/testsuite/gcc.dg/vect/bb-slp-43.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-43.c
b/gcc/testsuite/gcc.dg/vect/bb-slp-43.c
index dad2d24262d..8aedb06bf72 100
Notice there is a regression recently:
XPASS: gcc.dg/vect/bb-slp-subgroups-3.c -flto -ffat-lto-objects
scan-tree-dump-times slp2 "optimized: basic block" 2
XPASS: gcc.dg/vect/bb-slp-subgroups-3.c scan-tree-dump-times slp2 "optimized:
basic block" 2
Checked on both ARM SVE an RVV:
https://godbo
Recently notice there is a XPASS in RISC-V:
XPASS: gcc.dg/vect/bb-slp-43.c -flto -ffat-lto-objects scan-tree-dump-not slp2
"vector operands from scalars"
XPASS: gcc.dg/vect/bb-slp-43.c scan-tree-dump-not slp2 "vector operands from
scalars"
And checked both ARM SVE and RVV:
https://godbolt.org/
As PR113404 mentioned: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113404
We have ICE when we enable RVV in big-endian mode:
during RTL pass: expand
a-float-point-dynamic-frm-66.i:2:14: internal compiler error: in to_constant,
at poly-int.h:588
0xab4c2c poly_int<2u, unsigned short>::to_constant
This patch fixes -70% performance drop from GCC-13.2 to GCC-14 with
-march=rv64gcv in real hardware.
The root cause is incorrect cost model cause inefficient vectorization which
makes us performance drop significantly.
So this patch does:
1. Adjust vector to scalar cost by introducing v to sca
Rebase in v3: Rebase to the trunk and commit it as it's approved by Robin.
Update in v2: Add dynmaic lmul test.
This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14)
GCC 13.2.0:
lui a5,%hi(a)
li a4,19
sb a4,%lo(a)(a5)
li a0,0
Add more dump check to robostify the tests.
Committed.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/reduc-1.c: Add dump check.
* gcc.target/riscv/rvv/autovec/vls/reduc-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-11.c: Ditto.
* gcc.target/r
This patch fixes the following FAILs:
Running target
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.c-torture/execute/pr68532.c -O0 execution test
FAIL: gcc.c-torture/execute/pr68532.c -O1 execution test
FAIL: gcc.c-torture/execut
This patch fixes -70% performance drop from GCC-13.2 to GCC-14 with
-march=rv64gcv in real hardware.
The root cause is incorrect cost model cause inefficient vectorization which
makes us performance drop significantly.
So this patch does:
1. Adjust vector to scalar cost by introducing v to sca
Update in v2: Add dynmaic lmul test.
This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14)
GCC 13.2.0:
lui a5,%hi(a)
li a4,19
sb a4,%lo(a)(a5)
li a0,0
ret
Trunk GCC:
vsetvli a5,zero,e8,mf2,ta,ma
li
This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14)
GCC 13.2.0:
lui a5,%hi(a)
li a4,19
sb a4,%lo(a)(a5)
li a0,0
ret
Trunk GCC:
vsetvli a5,zero,e8,mf2,ta,ma
li a4,-32768
vid.v v1
1. Introduce vector regmove new tune info.
2. Adjust scalar_to_vec cost in add_stmt_cost.
We will get optimal codegen after this patch with -march=rv64gcv_zvl256b:
lui a5,%hi(a)
li a4,19
sb a4,%lo(a)(a5)
li a0,0
ret
Tested on both RV32/R
This test should pass no matter how we adjust cost model.
Remove -fno-vect-cost-model.
Committed.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/fold-min-poly.c: Remove
-fno-vect-cost-model
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/fold-min-poly.c | 2 +-
1 file changed
1. This patch set scalar_to_vec cost as 2 instead 1 since scalar move
instruction is slightly more costly than normal rvv instructions (e.g.
vadd.vv).
2. Adjust scalar_to_vec cost accurately according to the splat value, for
example,
a value like 32872, needs 2 more scalar instructions:
This patch fixes the following inefficient vectorized codes:
vsetvli a5,zero,e8,mf2,ta,ma
li a2,17
vid.v v1
li a4,-32768
vsetvli zero,zero,e16,m1,ta,ma
addiw a4,a4,104
vmv.v.i v3,15
lui a1,%hi(a)
li a0,1
This patch fixes the known issues on SLP cases:
ble a2,zero,.L11
addiw t1,a2,-1
li a5,15
bleut1,a5,.L9
srliw a7,t1,4
sllia7,a7,7
lui t3,%hi(.LANCHOR0)
lui a6,%hi(.LANCHOR0+128)
addit3,t3,%lo(.L
This patch is preparing patch for the following cost model tweak.
Since we don't have vector cost model in default tune info (rocket),
we set the cost model default as generic cost model by default.
The reason we want to switch to generic vector cost model is the default
cost model generates infe
This patch is preparing patch for the following cost model tweak.
Since we don't have vector cost model in default tune info (rocket),
we set the cost model default as generic cost model by default.
The reason we want to switch to generic vector cost model is the default
cost model generates infe
This patch is inspired by LLVM patches:
https://github.com/llvm/llvm-project/pull/76550
https://github.com/llvm/llvm-project/pull/77473
Use vaaddu for AVG vectorization.
Before this patch:
vsetivlizero,8,e8,mf2,ta,ma
vle8.v v3,0(a1)
vle8.v v2,0(a2)
vwadd
v2 update: Robostify tests.
While working on cost model, I notice one case that dynamic lmul cost doesn't
work well.
Before this patch:
foo:
lui a4,%hi(.LANCHOR0)
li a0,1953
li a1,63
addia4,a4,%lo(.LANCHOR0)
li a3,64
vsetvli
While working on cost model, I notice one case that dynamic lmul cost doesn't
work well.
Before this patch:
foo:
lui a4,%hi(.LANCHOR0)
li a0,1953
li a1,63
addia4,a4,%lo(.LANCHOR0)
li a3,64
vsetvli a2,zero,e32,mf2,ta,ma
While working on refining the cost model, I notice this test will generate
unexpected
scalar xor instructions if we don't tune cost model carefully.
Add more assembler to avoid future regression.
Committed.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: A
We have supported segment load/store intrinsics.
Committed as it is obvious.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-functions.def (vleff): Move
comments.
(vundefined): Ditto.
---
gcc/config/riscv/riscv-vector-builtins-functions.def | 4 ++--
1 file changed, 2 inse
We have supported segment load/store intrinsics.
Committed as it is obvious.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-functions.def (vleff): Move
comments to real place.
(vcreate): Ditto.
---
gcc/config/riscv/riscv-vector-builtins-functions.def | 4 +---
1 file chan
As Robin suggested, remove gimple_uid check which is sufficient for our need.
Tested on both RV32/RV64 no regression, ok for trunk ?
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (loop_invariant_op_p): Fix loop
invariant check.
---
gcc/config/riscv/riscv-vector-costs.cc | 2 +-
Obvious fix, Committed.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc: replace std::max by MAX.
---
gcc/config/riscv/riscv-vsetvl.cc | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 7d748edc
This patch fixes a bug of VSETVL PASS in this following situation:
Ignore curr info since prev info available with it:
prev_info: VALID (insn 8, bb 2)
Demand fields: demand_ratio_and_ge_sew demand_avl
SEW=16, VLMUL=mf4, RATIO=64, MAX_SEW=64
TAIL_POLICY=agnostic, M
1). We not only have vashl_optab,vashr_optab,vlshr_optab which vectorize shift
with vector shift amount,
that is, vectorization of 'a[i] >> x[i]', the shift amount is loop variant.
2). But also, we have ashl_optab, ashr_optab, lshr_optab which can vectorize
shift with scalar shift amount,
that is
V2: Address comments from Robin.
While working on fixing a bug, I notice this following code has redundant move:
#include "riscv_vector.h"
void
f (float x, float y, void *out)
{
float f[4] = { x, x, x, y };
vfloat32m1_t v = __riscv_vle32_v_f32m1 (f, 4);
__riscv_vse32_v_f32m1 (out, v, 4);
}
While working on fixing a bug, I notice this following code has redundant move:
#include "riscv_vector.h"
void
f (float x, float y, void *out)
{
float f[4] = { x, x, x, y };
vfloat32m1_t v = __riscv_vle32_v_f32m1 (f, 4);
__riscv_vse32_v_f32m1 (out, v, 4);
}
Before this patch:
f:
vs
1). We not only have vashl_optab,vashr_optab,vlshr_optab which vectorize shift
with vector shift amount,
that is, vectorization of 'a[i] >> x[i]', the shift amount is loop variant.
2). But also, we have ashl_optab, ashr_optab, lshr_optab which can vectorize
shift with scalar shift amount,
that is
Consider this following case:
void
f (int *restrict a, int *restrict b, int *restrict c, int *restrict d, int n)
{
for (int i = 0; i < n; i++)
{
int tmp = b[i] + 15;
int tmp2 = tmp + b[i];
c[i] = tmp2 + b[i];
d[i] = tmp + tmp2 + b[i];
}
}
Current dynamic LMUL cos
Consider this following case:
void
f (int *restrict a, int *restrict b, int *restrict c, int *restrict d, int n)
{
for (int i = 0; i < n; i++)
{
int tmp = b[i] + 15;
int tmp2 = tmp + b[i];
c[i] = tmp2 + b[i];
d[i] = tmp + tmp2 + b[i];
}
}
Current dynamic LMUL cos
Consider this following case:
void
f (int *restrict a, int *restrict b, int *restrict c, int *restrict d, int n)
{
for (int i = 0; i < n; i++)
{
int tmp = b[i] + 15;
int tmp2 = tmp + b[i];
c[i] = tmp2 + b[i];
d[i] = tmp + tmp2 + b[i];
}
}
Current dynamic LMUL cos
Notice a case has "Maximum lmul = 16" which is incorrect.
Correct LMUL estimation for MASK_LEN_LOAD/MASK_LEN_STORE.
Committed.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (variable_vectorized_p): New
function.
(compute_nregs_for_mode): Refine LMUL.
(max_number_of
Fix indent of some codes to make them 8 spaces align.
Committed.
gcc/ChangeLog:
* config/riscv/vector.md: Fix indent.
---
gcc/config/riscv/vector.md | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
in
As PR113206 and PR113209, the bugs happens on the following situation:
li a4,32
...
vsetvli zero,a4,e8,m8,ta,ma
...
slliw a4,a3,24
sraiw a4,a4,24
bge a3,a1,.L8
sb a4,%lo(e)(a0)
vsetvli zero,a4,e8,m8,ta,ma --
As PR113206 and PR113209, the bugs happens on the following situation:
li a4,32
...
vsetvli zero,a4,e8,m8,ta,ma
...
slliw a4,a3,24
sraiw a4,a4,24
bge a3,a1,.L8
sb a4,%lo(e)(a0)
vsetvli zero,a4,e8,m8,ta,ma --
As PR113206, the bugs happens on the following situation:
li a4,32
...
vsetvli zero,a4,e8,m8,ta,ma
...
slliw a4,a3,24
sraiw a4,a4,24
bge a3,a1,.L8
sb a4,%lo(e)(a0)
vsetvli zero,a4,e8,m8,ta,ma --> a4 is pollu
In
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d1eacedc6d9ba9f5522f2c8d49ccfdf7939ad72d
I optimize COND_LEN_xxx pattern with dummy len and dummy mask with too simply
solution which
causes redundant vsetvli in the following case:
vsetvli a5,a2,e8,m1,ta,ma
vle32.v v8,0(a0)
This patch fixes the following situation:
vl4re16.v v12,0(a5)
...
vl4re16.v v16,0(a3)
vs4r.v v12,0(a5)
...
vl4re16.v v4,0(a0)
vs4r.v v16,0(a3)
...
vsetvli a3,zero,e16,m4,ta,ma
...
vmv.v.x v8,t6
vmsgeu.vv v2,v16,v8
vsub.vv v16,v16,v8
vs4r.v v16,0(a5)
...
vs4r.v v4,0(a0)
v
Committed.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc: Move STMT_VINFO_TYPE (...) to
local.
---
gcc/config/riscv/riscv-vector-costs.cc | 9 -
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/gcc/config/riscv/riscv-vector-costs.cc
b/gcc/config/riscv/riscv-
The redudant dump check is fragile and easily changed, not necessary.
Tested on both RV32/RV64 no regression.
Remove it and committed.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: Remove redundant checks.
---
gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr
This patch fixes the following choosing unexpected big LMUL which cause
register spillings.
Before this patch, choosing LMUL = 4:
addisp,sp,-160
addiw t1,a2,-1
li a5,7
bleut1,a5,.L16
vsetivlizero,8,e64,m4,ta,ma
vmv.v.x v4,a0
Notice current dynamic LMUL is not accurate for conversion codes.
Refine for it, there is current case is changed from choosing LMUL = 4 into
LMUL = 8.
Tested no regression, committed.
Before this patch (LMUL = 4): After this patch (LMUL = 8):
lw a7,56(sp)
Consider this following case:
int f[12][100];
void bad1(int v1, int v2)
{
for (int r = 0; r < 100; r += 4)
{
int i = r + 1;
f[0][r] = f[1][r] * (f[2][r]) - f[1][i] * (f[2][i]);
f[0][i] = f[1][r] * (f[2][i]) + f[1][i] * (f[2][r]);
f[0][r+2] = f[1][r+2] * (f[2][r+2]) -
Notice we have this following situation:
vsetivlizero,4,e32,m1,ta,ma
vlseg4e32.v v4,(a5)
vlseg4e32.v v12,(a3)
vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since
VLMAX AVL = 4 when it is fixed-vlmax
vfadd.vfv3,v13,f
Notice we have this following situation:
vsetivlizero,4,e32,m1,ta,ma
vlseg4e32.v v4,(a5)
vlseg4e32.v v12,(a3)
vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since
VLMAX AVL = 4 when it is fixed-vlmax
vfadd.vfv3,v13,f
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: Fix typo.
---
.../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.
Tweak some codes of dynamic LMUL cost model to make computation more
predictable and accurate.
Tested on both RV32 and RV64 no regression.
Committed.
PR target/113112
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (compute_estimated_lmul): Tweak
LMUL estimation.
Currently, we compute RVV V_REGS liveness during better_main_loop_than_p which
is not appropriate
time to do that since we for example, when have the codes will finally pick
LMUL = 8 vectorization
factor, we compute liveness for LMUL = 8 multiple times which are redundant.
Since we have leverage
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: Add one more ASM check.
---
gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c
b/gcc/testsuite/
Consider this following case:
foo:
ble a0,zero,.L11
lui a2,%hi(.LANCHOR0)
addisp,sp,-128
addia2,a2,%lo(.LANCHOR0)
mv a1,a0
vsetvli a6,zero,e32,m8,ta,ma
vid.v v8
vs8r.v v8,0(sp) ---> spill
.L
Consider this following case:
foo:
ble a0,zero,.L11
lui a2,%hi(.LANCHOR0)
addisp,sp,-128
addia2,a2,%lo(.LANCHOR0)
mv a1,a0
vsetvli a6,zero,e32,m8,ta,ma
vid.v v8
vs8r.v v8,0(sp) ---> spill
.L
When working on evaluating x264 performance, I notice the best LMUL for such
case with -march=rv64gcv is LMUL = 2
LMUL = 1:
x264_pixel_8x8:
add a4,a1,a2
addia6,a0,16
vsetivlizero,4,e8,mf4,ta,ma
add a5,a4,a2
vle8.v v12,0(a6)
vle
This patch fixes following ICE on full coverage testing of RV32.
Running target
riscv-sim/-march=rv32gc_zve32f/-mabi=ilp32d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic
FAIL: gcc.c-torture/compile/930120-1.c -O2 (internal compiler error: in
emit_move_insn, at expr.cc:4606)
FAIL: gcc.c-t
While trying to fix bugs of PR113097, notice this following situation we
generate redundant vsetvli
_255 = SELECT_VL (3, POLY_INT_CST [4, 4]);
COND_LEN (..., _255)
Before this patch:
vsetivli a5, 3...
...
vadd.vv (use a5)
After this patch:
...
vadd.vv (use AVL = 3)
The reason we can do this i
This patch fixes bugs in the fusion of this following case:
li a5,-1
vmv.s.x v0,a5 -> demand any non-zero AVL
vsetvli a5, ...
Incorrect fusion after VSETVL PASS:
li a5,-1
vsetvli a5...
vmv.s.x v0, a5 --> a5 is modified as incorrect value.
We disallow this incorrect fusion above.
Full coverage
Due to recent VLSmode changes (Change for fixing ICE and run-time FAIL).
The dump check is same as ARM SVE now. So adapt test for RISC-V.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-cond-1.c: Adapt for RISC-V.
---
gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c | 4 ++--
1 file changed, 2
Hi, this patch fixes these following regression FAILs on RVV:
XPASS: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;"
XPASS: gcc.dg/vect/bb-slp-43.c -flto -ffat-lto-objects scan-tree-dump-not slp2
"vector operands from scalars"
XPASS: gcc.dg/vect/bb-slp-43.c scan-tree-dump-not sl
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Use builder.inner_mode
().
---
gcc/config/riscv/riscv-v.cc | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index d1eb7a0a9a5..486f5deb296 1
Fix this FAIL:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c scan-tree-dump-times
vect "Maximum lmul = 2" 1
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Adapt test.
---
gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c | 2 +-
1
Since
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=2e7abd09621a4401d44f4513adf126bce4b4828b
we only allow VLSmodes with size <= TARGET_MIN_VLEN * TARGET_MAX_LMUL.
So when -march=rv64gcv default LMUL = 1, we don't have VLS modes of
256/512/1024 vectors.
Disable them in vect test which fixes the
1 - 100 of 1001 matches
Mail list logo