[PATCH] RISC-V: Remove @ of vec_series

2023-10-04 Thread Juzhe-Zhong
gcc/ChangeLog: * config/riscv/autovec.md (@vec_series): Remove @. (vec_series): Ditto. * config/riscv/riscv-v.cc (expand_const_vector): Ditto. (shuffle_decompress_patterns): Ditto. --- gcc/config/riscv/autovec.md | 2 +- gcc/config/riscv/riscv-v.cc | 6 +++--- 2 f

[PATCH] RISC-V: Enable more tests of "vect" for RVV

2023-10-07 Thread Juzhe-Zhong
This patch enables almost full coverage vectorization tests for RVV, except these following tests (not enabled yet): 1. Will enable soon: check_effective_target_vect_call_lrint check_effective_target_vect_call_btrunc check_effective_target_vect_call_btruncf check_effective_target_vect_call_ceil

[PATCH] TEST: Fix XPASS of TSVC testsuites for RVV

2023-10-07 Thread Juzhe-Zhong
Fix these following XPASS FAILs of TSVC for RVV: XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto

[PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-07 Thread Juzhe-Zhong
This patch fixes the following dumple FAILs: FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump vect " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree

[PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-07 Thread Juzhe-Zhong
This patch fixes the following dumple FAILs: FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump vect " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree

[PATCH] RISC-V: Support movmisalign of RVV VLA modes

2023-10-08 Thread Juzhe-Zhong
Previously, I removed the movmisalign pattern to fix the execution FAILs in this commit: https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520 I was thinking that RVV doesn't allow misaligned at the beginning so I removed that pattern. However, after deep investigatio

[PATCH] TEST: Fix dump FAIL for RVV (RISCV-V vector)

2023-10-08 Thread Juzhe-Zhong
As this showed: https://godbolt.org/z/3K9oK7fx3 ARM SVE 2 times for FOLD_EXTRACT_LAST wheras RVV 4 times. This is because RISC-V doesn't enable vec_pack_trunc so we will failed conversion and fold_extract_last at the first time analysis. Then we succeed at the second time. So RVV has 4 times of

[PATCH] TEST: Fix dump FAIL of vect-multitypes-16.c for RVV

2023-10-08 Thread Juzhe-Zhong
RVV (RISC-V Vector) doesn't enable vect_unpack, but we still vectorize this case well. So, adjust dump check for RVV. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-multitypes-16.c: Fix dump FAIL of RVV. --- gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c | 4 ++-- 1 file changed, 2 insert

[PATCH] TEST: Fix dump FAIL for RVV

2023-10-08 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-cond-1.c: Fix dump FAIL for RVV. * gcc.dg/vect/pr57705.c: Ditto. --- gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c | 4 ++-- gcc/testsuite/gcc.dg/vect/pr57705.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git

[PATCH] TEST: Fix XPASS of outer loop vectorization tests for RVV

2023-10-08 Thread Juzhe-Zhong
Even though RVV doesn't enable vec_unpack/vec_pack, it succeed on outer loop vectorizations. Fix these following XPASS FAILs: XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER L

[PATCH V2] RISC-V: Support movmisalign of RVV VLA modes

2023-10-09 Thread Juzhe-Zhong
This patch fixed these following FAILs in regressions: FAIL: gcc.dg/vect/slp-perm-11.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-perm-11.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/vect-bitfield-read-

[PATCH] RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for RVV

2023-10-09 Thread Juzhe-Zhong
Reference: https://godbolt.org/z/G9jzf5Grh RVV is able to vectorize this case using SLP. However, with -fno-vect-cost-model, RVV vectorize it by vec_load_lanes with stride 6. gcc/testsuite/ChangeLog: * gcc.dg/vect/fast-math-slp-38.c: Add ! vect_strided6. --- gcc/testsuite/gcc.dg/vect/

[PATCH] RISC-V Regression test: Fix FAIL of pr45752.c for RVV

2023-10-09 Thread Juzhe-Zhong
RVV use load_lanes with stride = 5 vectorize this case with -fno-vect-cost-model instead of SLP. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr45752.c: Adapt dump check for target supports load_lanes with stride = 5. --- gcc/testsuite/gcc.dg/vect/pr45752.c | 2 +- 1 file changed, 1 insertio

[PATCH] RISC-V Regression tests: Fix FAIL of pr97832* for RVV

2023-10-09 Thread Juzhe-Zhong
These cases are vectorized by vec_load_lanes with strided = 8 instead of SLP with -fno-vect-cost-model. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr97832-2.c: Adapt dump check for target supports load_lanes with stride = 8. * gcc.dg/vect/pr97832-3.c: Ditto. * gcc.dg/vect/pr9

[PATCH] RISC-V Regression test: Fix FAIL of slp-12a.c

2023-10-09 Thread Juzhe-Zhong
This case is vectorized by stride8 load_lanes. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-12a.c: Adapt for stride 8 load_lanes. --- gcc/testsuite/gcc.dg/vect/slp-12a.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-12a.c b/gcc/testsuit

[PATCH] RISC-V Regression test: Adapt SLP tests like ARM SVE

2023-10-09 Thread Juzhe-Zhong
Like ARM SVE, RVV is vectorizing these 2 cases in the same way. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-23.c: Add RVV like ARM SVE. * gcc.dg/vect/slp-perm-10.c: Ditto. --- gcc/testsuite/gcc.dg/vect/slp-23.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-perm-10.c | 2 +- 2 files

[PATCH] RISC-V Regression test: Fix slp-perm-4.c FAIL for RVV

2023-10-09 Thread Juzhe-Zhong
RVV vectorize it with stride5 load_lanes. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-perm-4.c: Adapt test for stride5 load_lanes. --- gcc/testsuite/gcc.dg/vect/slp-perm-4.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-4.c b/gcc/

[PATCH] RISC-V Regression test: Fix FAIL of slp-reduc-4.c for RVV

2023-10-09 Thread Juzhe-Zhong
RVV vectortizes this case with stride8 load_lanes. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-reduc-4.c: Adapt test for stride8 load_lanes. --- gcc/testsuite/gcc.dg/vect/slp-reduc-4.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc

[PATCH] TEST: Add vectorization check

2023-10-09 Thread Juzhe-Zhong
These cases won't check SLP for load_lanes support target. Add vectorization check for situations. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr97832-2.c: Add vectorization check. * gcc.dg/vect/pr97832-3.c: Ditto. * gcc.dg/vect/pr97832-4.c: Ditto. --- gcc/testsuite/gcc.dg/v

[PATCH] RISC-V: Add available vector size for RVV

2023-10-09 Thread Juzhe-Zhong
For RVV, we have VLS modes enable according to TARGET_MIN_VLEN from M1 to M8. For example, when TARGET_MIN_VLEN = 128 bits, we enable 128/256/512/1024 bits VLS modes. This patch fixes following FAIL: FAIL: gcc.dg/vect/bb-slp-subgroups-2.c -flto -ffat-lto-objects scan-tree-dump-times slp2 "optim

[PATCH] RISC-V Regression: Fix dump check of bb-slp-68.c

2023-10-09 Thread Juzhe-Zhong
Like GCN, RVV also has 64 bytes vectors (512 bits) which cause FAIL in this test. It's more reasonable to use "vect512" instead of AMDGCN. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-68.c: Use vect512. --- gcc/testsuite/gcc.dg/vect/bb-slp-68.c | 2 +- 1 file changed, 1 insertion(+),

[PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV

2023-10-09 Thread Juzhe-Zhong
Here is the reference comparing dump IR between ARM SVE and RVV. https://godbolt.org/z/zqess8Gss We can see RVV has one more dump IR: optimized: basic block part vectorized using 128 byte vectors since RVV has 1024 bit vectors. The codegen is reasonable good. However, I saw GCN also has 1024 bi

[PATCH] RISC-V Regression: Make match patterns more accurate

2023-10-09 Thread Juzhe-Zhong
This patch fixes following 2 FAILs in RVV regression since the check is not accurate. It's inspired by Robin's previous patch: https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac...@gmail.com/ gcc/testsuite/ChangeLog: * gcc.dg/vect/no-scevccp-outer-7.c:

[PATCH] RISC-V Regression: Fix FAIL of predcom-2.c

2023-10-09 Thread Juzhe-Zhong
Like GCN, add -fno-tree-vectorize. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/predcom-2.c: Add riscv. --- gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c b/gcc/testsuite/gcc.dg/tree-

[Committed] RISC-V: Add testcase for SCCVN optimization[PR111751]

2023-10-10 Thread Juzhe-Zhong
Add testcase for PR111751 which has been fixed: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632474.html PR target/111751 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr111751.c: New test. --- .../gcc.target/riscv/rvv/autovec/pr111751.c | 55 +

[Committed] RISC-V: Add VLS BOOL mode vcond_mask[PR111751]

2023-10-10 Thread Juzhe-Zhong
Richard patch resolve PR111751: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=7c76c876e917a1f20a788f602cc78fff7d0a2a65 which cause ICE in RISC-V regression: FAIL: gcc.dg/torture/pr53144.c -O2 (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328) FAIL: gcc.dg/tortur

[PATCH] RISC-V Regression: Fix FAIL of pr65947-8.c for RVV

2023-10-10 Thread Juzhe-Zhong
This test is testing fold_extract_last pattern so it's more reasonable use vect_fold_extract_last instead of specifying targets. This is the vect_fold_extract_last property: proc check_effective_target_vect_fold_extract_last { } { return [expr { [check_effective_target_aarch64_sve]

[PATCH] RISC-V Regression: Fix FAIL of vect-multitypes-16.c for RVV

2023-10-10 Thread Juzhe-Zhong
As Richard suggested: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632288.html Add vect_ext_char_longlong to fix FAIL for RVV. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-multitypes-16.c: Adapt check for RVV. * lib/target-supports.exp: Add vect_ext_char_longlong proper

[PATCH] RISC-V Regression: Make pattern match more accurate of vect-live-2.c

2023-10-10 Thread Juzhe-Zhong
Like previous patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632400.html https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac...@gmail.com/ gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-live-2.c: Make pattern match more accurate. --- gcc/test

[PATCH] RISC-V: Remove XFAIL of ssa-dom-cse-2.c

2023-10-10 Thread Juzhe-Zhong
Confirm RISC-V is able to CSE this case no matter whether we enable RVV or not. Remove XFAIL, to fix: XPASS: gcc.dg/tree-ssa/ssa-dom-cse-2.c scan-tree-dump optimized "return 28;" gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove riscv. --- gcc/testsuite/gcc.dg/tree-ss

[PATCH] RISC-V: Enable full coverage vect tests

2023-10-10 Thread Juzhe-Zhong
I have analyzed all existing FAILs. Except these following FAILs need to be addressed: FAIL: gcc.dg/vect/slp-reduc-7.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/slp-reduc-7.c execution test FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump optimized " =

[PATCH] RISC-V: Fix incorrect index(offset) of gather/scatter

2023-10-11 Thread Juzhe-Zhong
I suddenly I made a mistake that was lucky un-exposed. https://godbolt.org/z/c3jzrh7or GCC is using 32 bit index offset: vsll.vi v1,v1,2 vsetvli zero,a5,e32,m1,ta,ma vluxei32.v v1,(a1),v1 This is wrong since v1 may overflow 32bit after vsll.vi. After this patch: v

[PATCH V2] RISC-V: Fix incorrect index(offset) of gather/scatter

2023-10-11 Thread Juzhe-Zhong
I suddenly discovered I made a mistake that was lucky un-exposed. https://godbolt.org/z/c3jzrh7or GCC is using 32 bit index offset: vsll.vi v1,v1,2 vsetvli zero,a5,e32,m1,ta,ma vluxei32.v v1,(a1),v1 This is wrong since v1 may overflow 32bit after vsll.vi. After thi

[PATCH V3] RISC-V: Fix incorrect index(offset) of gather/scatter

2023-10-11 Thread Juzhe-Zhong
I suddenly discovered I made a mistake that was lucky un-exposed. https://godbolt.org/z/c3jzrh7or GCC is using 32 bit index offset: vsll.vi v1,v1,2 vsetvli zero,a5,e32,m1,ta,ma vluxei32.v v1,(a1),v1 This is wrong since v1 may overflow 32bit after vsll.vi. After thi

[PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-11 Thread Juzhe-Zhong
This patch fixes this following FAILs in RISC-V regression: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -

[PATCH V2] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-11 Thread Juzhe-Zhong
This patch fixes this following FAILs in RISC-V regression: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -

[PATCH V3] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread Juzhe-Zhong
This patch fixes this following FAILs in RISC-V regression: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -

[PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr69907.c for RVV

2023-10-12 Thread Juzhe-Zhong
Like ARM SVE and GCN, add RVV. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-pr69907.c: Add RVV. --- gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c b/gcc/testsuite/gcc.dg/vect

[PATCH] RISC-V Regression: Fix FAIL of bb-slp-68.c for RVV

2023-10-12 Thread Juzhe-Zhong
Like comment said, this test failed on 64 bytes vector. Both RVV and GCN has 64 bytes vector. So it's more reasonable to use vect512. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-68.c: Use vect512. --- gcc/testsuite/gcc.dg/vect/bb-slp-68.c | 2 +- 1 file changed, 1 insertion(+), 1 dele

[Committed] RISC-V: Remove redundant iterators.

2023-10-13 Thread Juzhe-Zhong
These iterators are redundant, removed and commmitted. gcc/ChangeLog: * config/riscv/vector-iterators.md: Remove redundant iterators. --- gcc/config/riscv/vector-iterators.md | 110 --- 1 file changed, 110 deletions(-) diff --git a/gcc/config/riscv/vector-iterato

[Committed] RISC-V: Fix vsingle attribute

2023-10-14 Thread Juzhe-Zhong
RVVM2x2QI should be rvvm2qi instead of rvvmq1i. gcc/ChangeLog: * config/riscv/vector-iterators.md: Fix vsingle incorrect attribute for RVVM2x2QI. --- gcc/config/riscv/vector-iterators.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/riscv/vector-iterato

[PATCH] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-15 Thread Juzhe-Zhong
Consider this following case: int bar (int *x, int a, int b, int n) { x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); int sum1 = 0; int sum2 = 0; for (int i = 0; i < n; ++i) { sum1 += x[2*i] - a; sum1 += x[2*i+1] * b; sum2 += x[2*i] - b; sum2 += x[2*i+1]

[PATCH] RISC-V: Use VLS modes if the NITERS is known and smaller than VLS mode elements.

2023-10-16 Thread Juzhe-Zhong
void foo8 (int64_t *restrict a) { for (int i = 0; i < 16; ++i) a[i] = a[i]-16; } We use VLS modes instead of VLA modes even it is specified by dynamic LMUL. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (costs::preferred_new_lmul_p): Use VLS modes. gcc/testsuite/ChangeLog:

[PATCH V2] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread Juzhe-Zhong
Consider this following case: int bar (int *x, int a, int b, int n) { x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); int sum1 = 0; int sum2 = 0; for (int i = 0; i < n; ++i) { sum1 += x[2*i] - a; sum1 += x[2*i+1] * b; sum2 += x[2*i] - b; sum2 += x[2*i+1]

[PATCH V3] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread Juzhe-Zhong
Consider this following case: int bar (int *x, int a, int b, int n) { x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); int sum1 = 0; int sum2 = 0; for (int i = 0; i < n; ++i) { sum1 += x[2*i] - a; sum1 += x[2*i+1] * b; sum2 += x[2*i] - b; sum2 += x[2*i+1]

[PATCH V4] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread Juzhe-Zhong
Consider this following case: int bar (int *x, int a, int b, int n) { x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); int sum1 = 0; int sum2 = 0; for (int i = 0; i < n; ++i) { sum1 += x[2*i] - a; sum1 += x[2*i+1] * b; sum2 += x[2*i] - b; sum2 += x[2*i+1]

[PATCH V4] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-16 Thread Juzhe-Zhong
This patch fixes this following FAILs in RISC-V regression: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -

[PATCH] RISC-V: Enable more tests for dynamic LMUL and bug fix[PR111832]

2023-10-17 Thread Juzhe-Zhong
Last time, Robin has mentioned that dynamic LMUL will cause ICE in SPEC: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629992.html which is caused by assertion FAIL. When we enable more currents in rvv.exp with dynamic LMUL, such issue can be reproduced and has a PR: https://gcc.gnu.o

[PATCH] RISC-V: Optimize consecutive permutation index pattern by vrgather.vi/vx

2023-10-17 Thread Juzhe-Zhong
This patch optimize this following permutation with consecutive patterns index: typedef char vnx16i __attribute__ ((vector_size (16))); #define MASK_16 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15 vnx16i __attribute__ ((noinline, noclone)) test_1 (vnx16i x, vnx16i y) { return

[PATCH] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction

2023-10-18 Thread Juzhe-Zhong
Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848 But it generate horrible register spillings. The root cause is that we didn't hoist the vmv.v.x outside the loop which increase the SLP loop register pressure. So, chan

[PATCH V2] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction

2023-10-18 Thread Juzhe-Zhong
Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848 But it generate horrible register spillings. The root cause is that we didn't hoist the vmv.v.x outside the loop which increase the SLP loop register pressure. So, chan

[PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-18 Thread Juzhe-Zhong
This patch fixes this following FAILs in RISC-V regression: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -

[PATCH] RISC-V: Add RVV FMA auto-vectorization support

2023-05-26 Thread juzhe . zhong
From: Juzhe-Zhong This patch support FMA auto-vectorization pattern. 1. Let's RA decide vmacc or vmadd. 2. Fix bug of vector.md which generate incorrect information to VSETVL PASS when testing ternop-3.c. gcc/ChangeLog: * config/riscv/autovec.md (fma4): New pattern.

[PATCH V2] RISC-V: Add RVV FMA auto-vectorization support

2023-05-26 Thread juzhe . zhong
From: Juzhe-Zhong This patch support FMA auto-vectorization pattern. 1. Let's RA decide vmacc or vmadd. 2. Fix bug of vector.md which generate incorrect information to VSETVL PASS when testing ternop-3.c. gcc/ChangeLog: * config/riscv/autovec.md (fma4): New pattern.

[PATCH] RISC-V: Remove redundant printf of abs-run.c

2023-05-28 Thread juzhe . zhong
From: Juzhe-Zhong Notice that this testcase cause unexpected fail: FAIL: gcc.target/riscv/rvv/autovec/unop/abs-run.c (test for excess errors) Excess errors: /work/home/jzzhong/work/rvv-opensource/software/host/toolchain/gcc/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c:22:7

[PATCH] RISC-V: Add floating-point to integer conversion RVV auto-vectorization support

2023-05-28 Thread juzhe . zhong
From: Juzhe-Zhong Even though we can't support floating-point operations which are depending on FRM yet, (for example vfadd support is blocked) since the RVV intrinsic doc is not updated and we can't support mode switching for this. We can support floating-point to integer conversion

[PATCH V2] RISC-V: Add floating-point to integer conversion RVV auto-vectorization support

2023-05-28 Thread juzhe . zhong
From: Juzhe-Zhong Even though we can't support floating-point operations which are depending on FRM yet, (for example vfadd support is blocked) since the RVV intrinsic doc is not updated and we can't support mode switching for this. We can support floating-point to integer conversion

[PATCH] RISC-V: Add RVV FNMA auto-vectorization support

2023-05-28 Thread juzhe . zhong
From: Juzhe-Zhong Like FMA, Add FNMA auto-vectorization support. gcc/ChangeLog: * config/riscv/autovec.md (fnma4): New pattern. (*fnma): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test. * gcc.target/riscv/rvv/autovec

[PATCH V2] RISC-V: Add RVV FNMA auto-vectorization support

2023-05-28 Thread juzhe . zhong
From: Juzhe-Zhong Like FMA, Add FNMA (VNMSAC or VNMSUB) auto-vectorization support. gcc/ChangeLog: * config/riscv/autovec.md (fnma4): New pattern. (*fnma): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test. * gcc.target

[PATCH] RISC-V: Fix warning in riscv.md

2023-05-29 Thread juzhe . zhong
From: Juzhe-Zhong Notice there is warning: ../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (INTVAL (operands[2]) == GET_MODE_MASK (HImode)) ../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37

[PATCH V2] RISC-V: Fix warning in riscv.md

2023-05-29 Thread juzhe . zhong
From: Juzhe-Zhong Notice there is warning: ../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (INTVAL (operands[2]) == GET_MODE_MASK (HImode)) ../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37

[PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread juzhe . zhong
From: Ju-Zhe Zhong Follow Richi's suggestion, I change current decrement IV flow from: do { remain -= MIN (vf, remain); } while (remain != 0); into: do { old_remain = remain; len = MIN (vf, remain); remain -= vf; } while (old_remain >= vf); to enhance SCEV. ALL tests (decrement I

[PATCH] RISC-V: Support RVV permutation auto-vectorization

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong This patch supports vector permutation for VLS only by vec_perm pattern. We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation in the future. gcc/ChangeLog: * config/riscv/autovec.md (vec_perm): New pattern. * config/riscv/predicates.md

[PATCH] RISC-V: Add testcase for vrsub.vi auto-vectorization

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong Apparently, we are missing vrsub.vi tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add vsub.vi. * gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Ditto

[PATCH] RISC-V: Remove FRM for vfwcvt (RVV float to float widening conversion)

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong Base on the discussion here: https://github.com/riscv/riscv-v-spec/issues/884 vfwcvt doesn't depend on FRM. So remove FRM preparing for mode switching support. gcc/ChangeLog: * config/riscv/vector.md: Remove FRM. --- gcc/config/riscv/vector.md | 4 +--- 1

[PATCH] RISC-V: Remove FRM for vfwcvt.f.x.v (RVV integer to float widening conversion)

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong Base on the discussion here: https://github.com/riscv/riscv-v-spec/issues/884 vfwcvt.f.x.v doesn't depend on FRM. So remove FRM preparing for mode switching support. gcc/ChangeLog: * config/riscv/vector.md: Remove FRM. --- gcc/config/riscv/vector.md | 4 +-

[PATCH] RISC-V: Remove FRM for vfncvt.rod instruction

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong Apparently, vfncvt.rod rounding mode is encoded, so we don't need FRM. gcc/ChangeLog: * config/riscv/vector.md: Remove FRM. --- gcc/config/riscv/vector.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/gcc/config/riscv/vector.md b/gcc/c

[PATCH] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong The approach is quite simple and obvious, changing extension pattern into define_insn_and_split will make combine PASS combine into widen operations naturally. gcc/ChangeLog: * config/riscv/autovec.md (2): Change expand into define_insn_and_split. gcc/testsuite

[PATCH V2] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong Base on V1 patch, adding comment: ;; Use define_insn_and_split to define vsext.vf2/vzext.vf2 will help combine PASS ;; to combine instructions as below: ;; vsext.vf2 + vsext.vf2 + vadd.vv ==> vwadd.vv gcc/ChangeLog: * config/riscv/autovec.md (2): Change exp

[PATCH V2] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe . zhong
From: Ju-Zhe Zhong Follow Richi's suggestion, I change current decrement IV flow from: do { remain -= MIN (vf, remain); } while (remain != 0); into: do { old_remain = remain; len = MIN (vf, remain); remain -= vf; } while (old_remain >= vf); to enhance SCEV. Include fixes from kew

[PATCH V2] RISC-V: Support RVV permutation auto-vectorization

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong This patch supports vector permutation for VLS only by vec_perm pattern. We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation in the future. Fixed following comments from Robin. Ok for trunk? gcc/ChangeLog: * config/riscv/autovec.md (vec_perm

[PATCH] RISC-V: Add vwadd.wv/vwsub.wv auto-vectorization lowering optimization

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong 1. This patch optimize the codegen of the following auto-vectorization codes: void foo (int32_t * __restrict a, int64_t * __restrict b, int64_t * __restrict c, int n) { for (int i = 0; i < n; i++) c[i] = (int64_t)a[i] + b[i]; } Combine instruction f

[PATCH V3] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe . zhong
From: Ju-Zhe Zhong Follow Richi's suggestion, I change current decrement IV flow from: do { remain -= MIN (vf, remain); } while (remain != 0); into: do { old_remain = remain; len = MIN (vf, remain); remain -= vf; } while (old_remain >= vf); to enhance SCEV. Include fixes from kew

[PATCH] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong This patch is to enhance vwmul.vv combine optimizations. Consider this following code: void vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2, int16_t *__restrict dst3, int16_t *__restrict dst4, int8_t

[PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong This patch is to enhance vwmul.vv combine optimizations. Consider this following code: void vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2, int16_t *__restrict dst3, int16_t *__restrict dst4, int8_t

[PATCH] RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong Base on these: https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232 https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233 Add _mu C++ overloaded intrinsics for load && viota && vid. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-b

[PATCH] RISC-V: Add __RISCV_ prefix to VXRM and FRM enum

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong According to doc: https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222/files https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226 Add __RISCV_ prefix to VXRM and FRM enum. gcc/ChangeLog: * config/riscv/riscv-vector-builtins.cc (DEF_RVV_VXRM_ENUM): Add

[PATCH] RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong Base on these: https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232 https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233 Add _mu C++ overloaded intrinsics for load && viota && vid. Co-authored-by: KuanLin Chen gcc/ChangeLog: * co

[PATCH V2] RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong Base on these: https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232 https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233 Add _mu C++ overloaded intrinsics for load && viota && vid. Co-authored-by: KuanLin Chen gcc/ChangeLog: * co

[PATCH] RISC-V: Fix warning in predicated.md

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong Notice there is warning in predicates.md: ../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ???bool arith_operand_or_mode_mask(rtx, machine_mode)???: ../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between signed and unsigned integer

[PATCH] RISC-V: Optimize reverse series index vector

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong This patch optimizes the following seriese vector: [nunits - 1, nunits - 2, , 0] Before this patch: vid vmul vadd After this patch: vid vrsub This patch is an obvious and simple optimization, ok for trunk? gcc/ChangeLog: * config/riscv/riscv-v.cc

[PATCH V2] RISC-V: Fix warning in predicated.md

2023-06-02 Thread juzhe . zhong
From: Juzhe-Zhong Notice there is warning in predicates.md: ../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ???bool arith_operand_or_mode_mask(rtx, machine_mode): ../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between signed and unsigned

[PATCH] RISC-V: Remove redundant vlmul_ext_* patterns to fix PR110109

2023-06-04 Thread juzhe . zhong
From: Juzhe-Zhong PR target/110109 This patch is to fix PR110109 issue. This issue happens is because: (define_insn_and_split "*vlmul_extx2" [(set (match_operand: 0 "register_operand" "=vr, ?&vr") (subreg: (match_operand:VLMULEXT2

[NFC] RISC-V: Reorganize riscv-v.cc

2023-06-04 Thread juzhe . zhong
From: Juzhe-Zhong This patch is just reorganizing the functions for the following patch. I put rvv_builder and emit_* functions located before expand_const_vector function since I will use them in expand_const_vector in the following patch. gcc/ChangeLog: * config/riscv/riscv-v.cc

[PATCH] RISC-V: Split arguments of expand_vec_perm

2023-06-04 Thread juzhe . zhong
From: Juzhe-Zhong Since the following patch will calls expand_vec_perm with splitted arguments, change the expand_vec_perm interface in this patch. gcc/ChangeLog: * config/riscv/autovec.md: Split arguments. * config/riscv/riscv-protos.h (expand_vec_perm): Ditto

[NFC] RISC-V: Move optimization patterns into autovec-opt.md

2023-06-04 Thread juzhe . zhong
From: Juzhe-Zhong Move all optimization patterns into autovec-opt.md to make organization easier maintain. gcc/ChangeLog: * config/riscv/autovec-opt.md (*not): Move to autovec-opt.md. (*n): Ditto. * config/riscv/autovec.md (*not): Ditto. (*n): Ditto

[PATCH V2] VECT: Add SELECT_VL support

2023-06-04 Thread juzhe . zhong
From: Ju-Zhe Zhong This patch address comments from Richard and rebase to trunk. This patch is adding SELECT_VL middle-end support allow target have target dependent optimization in case of length calculation. This patch is inspired by RVV ISA and LLVM: https://reviews.llvm.org/D99750 The SELE

[PATCH V3] VECT: Add SELECT_VL support

2023-06-05 Thread juzhe . zhong
From: Ju-Zhe Zhong Co-authored-by: Richard Sandiford This patch address comments from Richard and rebase to trunk. This patch is adding SELECT_VL middle-end support allow target have target dependent optimization in case of length calculation. This patch is inspired by RVV ISA and LLVM: https:

[PATCH] RISC-V: Support RVV VLA SLP auto-vectorization

2023-06-05 Thread juzhe . zhong
From: Juzhe-Zhong This patch enables basic VLA SLP auto-vectorization. Consider this following case: void f (uint8_t *restrict a, uint8_t *restrict b) { for (int i = 0; i < 100; ++i) { a[i * 8 + 0] = b[i * 8 + 7] + 1; a[i * 8 + 1] = b[i * 8 + 7] + 2; a[i * 8 + 2] =

[PATCH] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization

2023-06-06 Thread juzhe . zhong
From: Juzhe-Zhong This patch add combine optimization for following case: __attribute__ ((noipa)) void vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b, int n) { for (int i = 0; i < n; i++) dst[i] += (int16_t) a[i] * (int16_t) b[i]; } Before t

[PATCH V2] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization

2023-06-06 Thread juzhe . zhong
From: Juzhe-Zhong Fix according to comments from Robin of V1 patch. This patch add combine optimization for following case: __attribute__ ((noipa)) void vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b, int n) { for (int i = 0; i < n; i++) ds

[PATCH] RISC-V: Support RVV VLA SLP auto-vectorization

2023-06-06 Thread juzhe . zhong
From: Juzhe-Zhong This patch enables basic VLA SLP auto-vectorization. Consider this following case: void f (uint8_t *restrict a, uint8_t *restrict b) { for (int i = 0; i < 100; ++i) { a[i * 8 + 0] = b[i * 8 + 7] + 1; a[i * 8 + 1] = b[i * 8 + 7] + 2; a[i * 8 + 2] =

[PATCH] RISC-V: Enable SELECT_VL for RVV

2023-06-06 Thread juzhe . zhong
From: Juzhe-Zhong gcc/ChangeLog: * config/riscv/autovec.md (select_vl): New pattern. * config/riscv/riscv-protos.h (gen_no_side_effects_vsetvl_rtx): export global. * config/riscv/riscv-v.cc (force_vector_length_operand): Ditto. gcc/testsuite/ChangeLog

[PATCH V3] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization

2023-06-06 Thread juzhe . zhong
From: Juzhe-Zhong Fix according to comments from Robin of V1 patch. This patch add combine optimization for following case: __attribute__ ((noipa)) void vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b, int n) { for (int i = 0; i < n; i++) ds

[PATCH V4] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization

2023-06-06 Thread juzhe . zhong
From: Juzhe-Zhong Fix according to comments from Robin of V1 patch. This patch add combine optimization for following case: __attribute__ ((noipa)) void vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b, int n) { for (int i = 0; i < n; i++) ds

[PATCH V2] RISC-V: Support RVV VLA SLP auto-vectorization

2023-06-06 Thread juzhe . zhong
From: Juzhe-Zhong This patch enables basic VLA SLP auto-vectorization. Consider this following case: void f (uint8_t *restrict a, uint8_t *restrict b) { for (int i = 0; i < 100; ++i) { a[i * 8 + 0] = b[i * 8 + 7] + 1; a[i * 8 + 1] = b[i * 8 + 7] + 2; a[i * 8 + 2] =

[PATCH V4] VECT: Add SELECT_VL support

2023-06-07 Thread juzhe . zhong
From: Ju-Zhe Zhong Co-authored-by: Richard Sandiford Co-authored-by: Richard Biener This patch address comments from Richard && Richi and rebase to trunk. This patch is adding SELECT_VL middle-end support allow target have target dependent optimization in case of length calculation. This patc

[PATCH V5] VECT: Add SELECT_VL support

2023-06-07 Thread juzhe . zhong
From: Ju-Zhe Zhong Co-authored-by: Richard Sandiford Co-authored-by: Richard Biener This patch address comments from Richard && Richi and rebase to trunk. This patch is adding SELECT_VL middle-end support allow target have target dependent optimization in case of length calculation. This patc

[PATCH V6] VECT: Add SELECT_VL support

2023-06-09 Thread juzhe . zhong
From: Ju-Zhe Zhong Co-authored-by: Richard Sandiford Co-authored-by: Richard Biener This patch address comments from Richard && Richi and rebase to trunk. This patch is adding SELECT_VL middle-end support allow target have target dependent optimization in case of length calculation. This patc

[PATCH] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

2023-06-09 Thread juzhe . zhong
From: Juzhe-Zhong This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && Phase 6 are quite messy and cause some bugs discovered by my downstream auto-vectorization test-generator. Before this patch. Phase 5 is cleanup_insns is the function remove AVL op

  1   2   3   4   5   6   7   8   9   10   >