gcc/ChangeLog:
* config/riscv/autovec.md (@vec_series): Remove @.
(vec_series): Ditto.
* config/riscv/riscv-v.cc (expand_const_vector): Ditto.
(shuffle_decompress_patterns): Ditto.
---
gcc/config/riscv/autovec.md | 2 +-
gcc/config/riscv/riscv-v.cc | 6 +++---
2 f
This patch enables almost full coverage vectorization tests for RVV, except
these
following tests (not enabled yet):
1. Will enable soon:
check_effective_target_vect_call_lrint
check_effective_target_vect_call_btrunc
check_effective_target_vect_call_btruncf
check_effective_target_vect_call_ceil
Fix these following XPASS FAILs of TSVC for RVV:
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto
This patch fixes the following dumple FAILs:
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump
vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree
This patch fixes the following dumple FAILs:
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump
vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree
Previously, I removed the movmisalign pattern to fix the execution FAILs in
this commit:
https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520
I was thinking that RVV doesn't allow misaligned at the beginning so I removed
that pattern.
However, after deep investigatio
As this showed: https://godbolt.org/z/3K9oK7fx3
ARM SVE 2 times for FOLD_EXTRACT_LAST wheras RVV 4 times.
This is because RISC-V doesn't enable vec_pack_trunc so we will failed
conversion and fold_extract_last at the first time analysis.
Then we succeed at the second time.
So RVV has 4 times of
RVV (RISC-V Vector) doesn't enable vect_unpack, but we still vectorize this
case well.
So, adjust dump check for RVV.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-multitypes-16.c: Fix dump FAIL of RVV.
---
gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c | 4 ++--
1 file changed, 2 insert
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-cond-1.c: Fix dump FAIL for RVV.
* gcc.dg/vect/pr57705.c: Ditto.
---
gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c | 4 ++--
gcc/testsuite/gcc.dg/vect/pr57705.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git
Even though RVV doesn't enable vec_unpack/vec_pack, it succeed on outer loop
vectorizations.
Fix these following XPASS FAILs:
XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER L
This patch fixed these following FAILs in regressions:
FAIL: gcc.dg/vect/slp-perm-11.c -flto -ffat-lto-objects scan-tree-dump-times
vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-perm-11.c scan-tree-dump-times vect "vectorizing stmts
using SLP" 1
FAIL: gcc.dg/vect/vect-bitfield-read-
Reference: https://godbolt.org/z/G9jzf5Grh
RVV is able to vectorize this case using SLP. However, with
-fno-vect-cost-model,
RVV vectorize it by vec_load_lanes with stride 6.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/fast-math-slp-38.c: Add ! vect_strided6.
---
gcc/testsuite/gcc.dg/vect/
RVV use load_lanes with stride = 5 vectorize this case with -fno-vect-cost-model
instead of SLP.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr45752.c: Adapt dump check for target supports
load_lanes with stride = 5.
---
gcc/testsuite/gcc.dg/vect/pr45752.c | 2 +-
1 file changed, 1 insertio
These cases are vectorized by vec_load_lanes with strided = 8 instead of SLP
with -fno-vect-cost-model.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr97832-2.c: Adapt dump check for target supports
load_lanes with stride = 8.
* gcc.dg/vect/pr97832-3.c: Ditto.
* gcc.dg/vect/pr9
This case is vectorized by stride8 load_lanes.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-12a.c: Adapt for stride 8 load_lanes.
---
gcc/testsuite/gcc.dg/vect/slp-12a.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-12a.c
b/gcc/testsuit
Like ARM SVE, RVV is vectorizing these 2 cases in the same way.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-23.c: Add RVV like ARM SVE.
* gcc.dg/vect/slp-perm-10.c: Ditto.
---
gcc/testsuite/gcc.dg/vect/slp-23.c | 2 +-
gcc/testsuite/gcc.dg/vect/slp-perm-10.c | 2 +-
2 files
RVV vectorize it with stride5 load_lanes.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-perm-4.c: Adapt test for stride5 load_lanes.
---
gcc/testsuite/gcc.dg/vect/slp-perm-4.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-4.c
b/gcc/
RVV vectortizes this case with stride8 load_lanes.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-reduc-4.c: Adapt test for stride8 load_lanes.
---
gcc/testsuite/gcc.dg/vect/slp-reduc-4.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc
These cases won't check SLP for load_lanes support target.
Add vectorization check for situations.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr97832-2.c: Add vectorization check.
* gcc.dg/vect/pr97832-3.c: Ditto.
* gcc.dg/vect/pr97832-4.c: Ditto.
---
gcc/testsuite/gcc.dg/v
For RVV, we have VLS modes enable according to TARGET_MIN_VLEN
from M1 to M8.
For example, when TARGET_MIN_VLEN = 128 bits, we enable
128/256/512/1024 bits VLS modes.
This patch fixes following FAIL:
FAIL: gcc.dg/vect/bb-slp-subgroups-2.c -flto -ffat-lto-objects
scan-tree-dump-times slp2 "optim
Like GCN, RVV also has 64 bytes vectors (512 bits) which cause FAIL in this
test.
It's more reasonable to use "vect512" instead of AMDGCN.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-68.c: Use vect512.
---
gcc/testsuite/gcc.dg/vect/bb-slp-68.c | 2 +-
1 file changed, 1 insertion(+),
Here is the reference comparing dump IR between ARM SVE and RVV.
https://godbolt.org/z/zqess8Gss
We can see RVV has one more dump IR:
optimized: basic block part vectorized using 128 byte vectors
since RVV has 1024 bit vectors.
The codegen is reasonable good.
However, I saw GCN also has 1024 bi
This patch fixes following 2 FAILs in RVV regression since the check is not
accurate.
It's inspired by Robin's previous patch:
https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac...@gmail.com/
gcc/testsuite/ChangeLog:
* gcc.dg/vect/no-scevccp-outer-7.c:
Like GCN, add -fno-tree-vectorize.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/predcom-2.c: Add riscv.
---
gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
b/gcc/testsuite/gcc.dg/tree-
Add testcase for PR111751 which has been fixed:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632474.html
PR target/111751
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr111751.c: New test.
---
.../gcc.target/riscv/rvv/autovec/pr111751.c | 55 +
Richard patch resolve PR111751:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=7c76c876e917a1f20a788f602cc78fff7d0a2a65
which cause ICE in RISC-V regression:
FAIL: gcc.dg/torture/pr53144.c -O2 (internal compiler error: in
gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
FAIL: gcc.dg/tortur
This test is testing fold_extract_last pattern so it's more reasonable use
vect_fold_extract_last instead of specifying targets.
This is the vect_fold_extract_last property:
proc check_effective_target_vect_fold_extract_last { } {
return [expr { [check_effective_target_aarch64_sve]
As Richard suggested:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632288.html
Add vect_ext_char_longlong to fix FAIL for RVV.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-multitypes-16.c: Adapt check for RVV.
* lib/target-supports.exp: Add vect_ext_char_longlong proper
Like previous patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632400.html
https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac...@gmail.com/
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-live-2.c: Make pattern match more accurate.
---
gcc/test
Confirm RISC-V is able to CSE this case no matter whether we enable RVV or not.
Remove XFAIL, to fix:
XPASS: gcc.dg/tree-ssa/ssa-dom-cse-2.c scan-tree-dump optimized "return 28;"
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove riscv.
---
gcc/testsuite/gcc.dg/tree-ss
I have analyzed all existing FAILs.
Except these following FAILs need to be addressed:
FAIL: gcc.dg/vect/slp-reduc-7.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/slp-reduc-7.c execution test
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump
optimized " =
I suddenly I made a mistake that was lucky un-exposed.
https://godbolt.org/z/c3jzrh7or
GCC is using 32 bit index offset:
vsll.vi v1,v1,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei32.v v1,(a1),v1
This is wrong since v1 may overflow 32bit after vsll.vi.
After this patch:
v
I suddenly discovered I made a mistake that was lucky un-exposed.
https://godbolt.org/z/c3jzrh7or
GCC is using 32 bit index offset:
vsll.vi v1,v1,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei32.v v1,(a1),v1
This is wrong since v1 may overflow 32bit after vsll.vi.
After thi
I suddenly discovered I made a mistake that was lucky un-exposed.
https://godbolt.org/z/c3jzrh7or
GCC is using 32 bit index offset:
vsll.vi v1,v1,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei32.v v1,(a1),v1
This is wrong since v1 may overflow 32bit after vsll.vi.
After thi
This patch fixes this following FAILs in RISC-V regression:
FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP
stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -
This patch fixes this following FAILs in RISC-V regression:
FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP
stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -
This patch fixes this following FAILs in RISC-V regression:
FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP
stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -
Like ARM SVE and GCN, add RVV.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-pr69907.c: Add RVV.
---
gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
b/gcc/testsuite/gcc.dg/vect
Like comment said, this test failed on 64 bytes vector.
Both RVV and GCN has 64 bytes vector.
So it's more reasonable to use vect512.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-68.c: Use vect512.
---
gcc/testsuite/gcc.dg/vect/bb-slp-68.c | 2 +-
1 file changed, 1 insertion(+), 1 dele
These iterators are redundant, removed and commmitted.
gcc/ChangeLog:
* config/riscv/vector-iterators.md: Remove redundant iterators.
---
gcc/config/riscv/vector-iterators.md | 110 ---
1 file changed, 110 deletions(-)
diff --git a/gcc/config/riscv/vector-iterato
RVVM2x2QI should be rvvm2qi instead of rvvmq1i.
gcc/ChangeLog:
* config/riscv/vector-iterators.md: Fix vsingle incorrect attribute for
RVVM2x2QI.
---
gcc/config/riscv/vector-iterators.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/config/riscv/vector-iterato
Consider this following case:
int
bar (int *x, int a, int b, int n)
{
x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
int sum1 = 0;
int sum2 = 0;
for (int i = 0; i < n; ++i)
{
sum1 += x[2*i] - a;
sum1 += x[2*i+1] * b;
sum2 += x[2*i] - b;
sum2 += x[2*i+1]
void
foo8 (int64_t *restrict a)
{
for (int i = 0; i < 16; ++i)
a[i] = a[i]-16;
}
We use VLS modes instead of VLA modes even it is specified by dynamic LMUL.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (costs::preferred_new_lmul_p): Use
VLS modes.
gcc/testsuite/ChangeLog:
Consider this following case:
int
bar (int *x, int a, int b, int n)
{
x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
int sum1 = 0;
int sum2 = 0;
for (int i = 0; i < n; ++i)
{
sum1 += x[2*i] - a;
sum1 += x[2*i+1] * b;
sum2 += x[2*i] - b;
sum2 += x[2*i+1]
Consider this following case:
int
bar (int *x, int a, int b, int n)
{
x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
int sum1 = 0;
int sum2 = 0;
for (int i = 0; i < n; ++i)
{
sum1 += x[2*i] - a;
sum1 += x[2*i+1] * b;
sum2 += x[2*i] - b;
sum2 += x[2*i+1]
Consider this following case:
int
bar (int *x, int a, int b, int n)
{
x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
int sum1 = 0;
int sum2 = 0;
for (int i = 0; i < n; ++i)
{
sum1 += x[2*i] - a;
sum1 += x[2*i+1] * b;
sum2 += x[2*i] - b;
sum2 += x[2*i+1]
This patch fixes this following FAILs in RISC-V regression:
FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP
stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -
Last time, Robin has mentioned that dynamic LMUL will cause ICE in SPEC:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629992.html
which is caused by assertion FAIL.
When we enable more currents in rvv.exp with dynamic LMUL, such issue can be
reproduced and has a PR: https://gcc.gnu.o
This patch optimize this following permutation with consecutive patterns index:
typedef char vnx16i __attribute__ ((vector_size (16)));
#define MASK_16 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15
vnx16i __attribute__ ((noinline, noclone))
test_1 (vnx16i x, vnx16i y)
{
return
Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848
But it generate horrible register spillings.
The root cause is that we didn't hoist the vmv.v.x outside the loop which
increase the SLP loop register pressure.
So, chan
Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848
But it generate horrible register spillings.
The root cause is that we didn't hoist the vmv.v.x outside the loop which
increase the SLP loop register pressure.
So, chan
This patch fixes this following FAILs in RISC-V regression:
FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP
stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -
From: Juzhe-Zhong
This patch support FMA auto-vectorization pattern.
1. Let's RA decide vmacc or vmadd.
2. Fix bug of vector.md which generate incorrect information to VSETVL
PASS when testing ternop-3.c.
gcc/ChangeLog:
* config/riscv/autovec.md (fma4): New pattern.
From: Juzhe-Zhong
This patch support FMA auto-vectorization pattern.
1. Let's RA decide vmacc or vmadd.
2. Fix bug of vector.md which generate incorrect information to VSETVL
PASS when testing ternop-3.c.
gcc/ChangeLog:
* config/riscv/autovec.md (fma4): New pattern.
From: Juzhe-Zhong
Notice that this testcase cause unexpected fail:
FAIL: gcc.target/riscv/rvv/autovec/unop/abs-run.c (test for excess errors)
Excess errors:
/work/home/jzzhong/work/rvv-opensource/software/host/toolchain/gcc/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c:22:7
From: Juzhe-Zhong
Even though we can't support floating-point operations which are depending
on FRM yet, (for example vfadd support is blocked) since the RVV intrinsic doc
is not updated
and we can't support mode switching for this.
We can support floating-point to integer conversion
From: Juzhe-Zhong
Even though we can't support floating-point operations which are depending
on FRM yet, (for example vfadd support is blocked) since the RVV intrinsic doc
is not updated
and we can't support mode switching for this.
We can support floating-point to integer conversion
From: Juzhe-Zhong
Like FMA, Add FNMA auto-vectorization support.
gcc/ChangeLog:
* config/riscv/autovec.md (fnma4): New pattern.
(*fnma): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test.
* gcc.target/riscv/rvv/autovec
From: Juzhe-Zhong
Like FMA, Add FNMA (VNMSAC or VNMSUB) auto-vectorization support.
gcc/ChangeLog:
* config/riscv/autovec.md (fnma4): New pattern.
(*fnma): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test.
* gcc.target
From: Juzhe-Zhong
Notice there is warning:
../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison
between signed and unsigned integer expressions [-Wsign-compare]
if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37
From: Juzhe-Zhong
Notice there is warning:
../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison
between signed and unsigned integer expressions [-Wsign-compare]
if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37
From: Ju-Zhe Zhong
Follow Richi's suggestion, I change current decrement IV flow from:
do {
remain -= MIN (vf, remain);
} while (remain != 0);
into:
do {
old_remain = remain;
len = MIN (vf, remain);
remain -= vf;
} while (old_remain >= vf);
to enhance SCEV.
ALL tests (decrement I
From: Juzhe-Zhong
This patch supports vector permutation for VLS only by vec_perm pattern.
We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation
in the future.
gcc/ChangeLog:
* config/riscv/autovec.md (vec_perm): New pattern.
* config/riscv/predicates.md
From: Juzhe-Zhong
Apparently, we are missing vrsub.vi tests.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add vsub.vi.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Ditto
From: Juzhe-Zhong
Base on the discussion here:
https://github.com/riscv/riscv-v-spec/issues/884
vfwcvt doesn't depend on FRM. So remove FRM preparing for mode switching
support.
gcc/ChangeLog:
* config/riscv/vector.md: Remove FRM.
---
gcc/config/riscv/vector.md | 4 +---
1
From: Juzhe-Zhong
Base on the discussion here:
https://github.com/riscv/riscv-v-spec/issues/884
vfwcvt.f.x.v doesn't depend on FRM. So remove FRM preparing for mode
switching support.
gcc/ChangeLog:
* config/riscv/vector.md: Remove FRM.
---
gcc/config/riscv/vector.md | 4 +-
From: Juzhe-Zhong
Apparently, vfncvt.rod rounding mode is encoded, so we don't need FRM.
gcc/ChangeLog:
* config/riscv/vector.md: Remove FRM.
---
gcc/config/riscv/vector.md | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/gcc/config/riscv/vector.md b/gcc/c
From: Juzhe-Zhong
The approach is quite simple and obvious, changing extension pattern into
define_insn_and_split
will make combine PASS combine into widen operations naturally.
gcc/ChangeLog:
* config/riscv/autovec.md (2): Change
expand into define_insn_and_split.
gcc/testsuite
From: Juzhe-Zhong
Base on V1 patch, adding comment:
;; Use define_insn_and_split to define vsext.vf2/vzext.vf2 will help combine
PASS
;; to combine instructions as below:
;; vsext.vf2 + vsext.vf2 + vadd.vv ==> vwadd.vv
gcc/ChangeLog:
* config/riscv/autovec.md (2): Change
exp
From: Ju-Zhe Zhong
Follow Richi's suggestion, I change current decrement IV flow from:
do {
remain -= MIN (vf, remain);
} while (remain != 0);
into:
do {
old_remain = remain;
len = MIN (vf, remain);
remain -= vf;
} while (old_remain >= vf);
to enhance SCEV.
Include fixes from kew
From: Juzhe-Zhong
This patch supports vector permutation for VLS only by vec_perm pattern.
We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation
in the future.
Fixed following comments from Robin.
Ok for trunk?
gcc/ChangeLog:
* config/riscv/autovec.md (vec_perm
From: Juzhe-Zhong
1. This patch optimize the codegen of the following auto-vectorization codes:
void foo (int32_t * __restrict a, int64_t * __restrict b, int64_t * __restrict
c, int n)
{
for (int i = 0; i < n; i++)
c[i] = (int64_t)a[i] + b[i];
}
Combine instruction f
From: Ju-Zhe Zhong
Follow Richi's suggestion, I change current decrement IV flow from:
do {
remain -= MIN (vf, remain);
} while (remain != 0);
into:
do {
old_remain = remain;
len = MIN (vf, remain);
remain -= vf;
} while (old_remain >= vf);
to enhance SCEV.
Include fixes from kew
From: Juzhe-Zhong
This patch is to enhance vwmul.vv combine optimizations.
Consider this following code:
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
int16_t *__restrict dst3, int16_t *__restrict dst4,
int8_t
From: Juzhe-Zhong
This patch is to enhance vwmul.vv combine optimizations.
Consider this following code:
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
int16_t *__restrict dst3, int16_t *__restrict dst4,
int8_t
From: Juzhe-Zhong
Base on these:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233
Add _mu C++ overloaded intrinsics for load && viota && vid.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-b
From: Juzhe-Zhong
According to doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222/files
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
Add __RISCV_ prefix to VXRM and FRM enum.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_VXRM_ENUM): Add
From: Juzhe-Zhong
Base on these:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233
Add _mu C++ overloaded intrinsics for load && viota && vid.
Co-authored-by: KuanLin Chen
gcc/ChangeLog:
* co
From: Juzhe-Zhong
Base on these:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233
Add _mu C++ overloaded intrinsics for load && viota && vid.
Co-authored-by: KuanLin Chen
gcc/ChangeLog:
* co
From: Juzhe-Zhong
Notice there is warning in predicates.md:
../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ???bool
arith_operand_or_mode_mask(rtx, machine_mode)???:
../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison
between signed and unsigned integer
From: Juzhe-Zhong
This patch optimizes the following seriese vector:
[nunits - 1, nunits - 2, , 0]
Before this patch:
vid
vmul
vadd
After this patch:
vid
vrsub
This patch is an obvious and simple optimization, ok for trunk?
gcc/ChangeLog:
* config/riscv/riscv-v.cc
From: Juzhe-Zhong
Notice there is warning in predicates.md:
../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ???bool
arith_operand_or_mode_mask(rtx, machine_mode):
../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison
between signed and unsigned
From: Juzhe-Zhong
PR target/110109
This patch is to fix PR110109 issue. This issue happens is because:
(define_insn_and_split "*vlmul_extx2"
[(set (match_operand: 0 "register_operand" "=vr, ?&vr")
(subreg:
(match_operand:VLMULEXT2
From: Juzhe-Zhong
This patch is just reorganizing the functions for the following patch.
I put rvv_builder and emit_* functions located before expand_const_vector
function since I will use them in expand_const_vector in the following patch.
gcc/ChangeLog:
* config/riscv/riscv-v.cc
From: Juzhe-Zhong
Since the following patch will calls expand_vec_perm with
splitted arguments, change the expand_vec_perm interface in
this patch.
gcc/ChangeLog:
* config/riscv/autovec.md: Split arguments.
* config/riscv/riscv-protos.h (expand_vec_perm): Ditto
From: Juzhe-Zhong
Move all optimization patterns into autovec-opt.md to make organization
easier maintain.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*not): Move to
autovec-opt.md.
(*n): Ditto.
* config/riscv/autovec.md (*not): Ditto.
(*n): Ditto
From: Ju-Zhe Zhong
This patch address comments from Richard and rebase to trunk.
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750
The SELE
From: Ju-Zhe Zhong
Co-authored-by: Richard Sandiford
This patch address comments from Richard and rebase to trunk.
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
This patch is inspired by RVV ISA and LLVM:
https:
From: Juzhe-Zhong
This patch enables basic VLA SLP auto-vectorization.
Consider this following case:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
for (int i = 0; i < 100; ++i)
{
a[i * 8 + 0] = b[i * 8 + 7] + 1;
a[i * 8 + 1] = b[i * 8 + 7] + 2;
a[i * 8 + 2] =
From: Juzhe-Zhong
This patch add combine optimization for following case:
__attribute__ ((noipa)) void
vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b,
int n)
{
for (int i = 0; i < n; i++)
dst[i] += (int16_t) a[i] * (int16_t) b[i];
}
Before t
From: Juzhe-Zhong
Fix according to comments from Robin of V1 patch.
This patch add combine optimization for following case:
__attribute__ ((noipa)) void
vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b,
int n)
{
for (int i = 0; i < n; i++)
ds
From: Juzhe-Zhong
This patch enables basic VLA SLP auto-vectorization.
Consider this following case:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
for (int i = 0; i < 100; ++i)
{
a[i * 8 + 0] = b[i * 8 + 7] + 1;
a[i * 8 + 1] = b[i * 8 + 7] + 2;
a[i * 8 + 2] =
From: Juzhe-Zhong
gcc/ChangeLog:
* config/riscv/autovec.md (select_vl): New pattern.
* config/riscv/riscv-protos.h (gen_no_side_effects_vsetvl_rtx): export
global.
* config/riscv/riscv-v.cc (force_vector_length_operand): Ditto.
gcc/testsuite/ChangeLog
From: Juzhe-Zhong
Fix according to comments from Robin of V1 patch.
This patch add combine optimization for following case:
__attribute__ ((noipa)) void
vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b,
int n)
{
for (int i = 0; i < n; i++)
ds
From: Juzhe-Zhong
Fix according to comments from Robin of V1 patch.
This patch add combine optimization for following case:
__attribute__ ((noipa)) void
vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b,
int n)
{
for (int i = 0; i < n; i++)
ds
From: Juzhe-Zhong
This patch enables basic VLA SLP auto-vectorization.
Consider this following case:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
for (int i = 0; i < 100; ++i)
{
a[i * 8 + 0] = b[i * 8 + 7] + 1;
a[i * 8 + 1] = b[i * 8 + 7] + 2;
a[i * 8 + 2] =
From: Ju-Zhe Zhong
Co-authored-by: Richard Sandiford
Co-authored-by: Richard Biener
This patch address comments from Richard && Richi and rebase to trunk.
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
This patc
From: Ju-Zhe Zhong
Co-authored-by: Richard Sandiford
Co-authored-by: Richard Biener
This patch address comments from Richard && Richi and rebase to trunk.
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
This patc
From: Ju-Zhe Zhong
Co-authored-by: Richard Sandiford
Co-authored-by: Richard Biener
This patch address comments from Richard && Richi and rebase to trunk.
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
This patc
From: Juzhe-Zhong
This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 &&
Phase 6
are quite messy and cause some bugs discovered by my downstream
auto-vectorization
test-generator.
Before this patch.
Phase 5 is cleanup_insns is the function remove AVL op
1 - 100 of 1101 matches
Mail list logo