[SUBREG V4 3/4] IRA: Apply DF_LIVE_SUBREG data

2024-05-12 Thread Juzhe-Zhong
--- gcc/ira-build.cc | 7 --- gcc/ira-color.cc | 8 gcc/ira-emit.cc | 12 ++-- gcc/ira-lives.cc | 7 --- gcc/ira.cc | 19 --- 5 files changed, 30 insertions(+), 23 deletions(-) diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc index ea593d5a087..2

[SUBREG V4 2/4] DF: Add DF_LIVE_SUBREG problem

2024-05-12 Thread Juzhe-Zhong
--- gcc/Makefile.in | 1 + gcc/df-problems.cc | 886 ++- gcc/df.h | 159 +++ gcc/regs.h | 5 + gcc/sbitmap.cc | 98 + gcc/sbitmap.h| 2 + gcc/subreg-live-range.cc | 233 ++

[SUBREG V4 1/4] DF: Add -ftrack-subreg-liveness option

2024-05-12 Thread Juzhe-Zhong
--- gcc/common.opt | 4 gcc/common.opt.urls | 3 +++ gcc/doc/invoke.texi | 8 gcc/opts.cc | 1 + 4 files changed, 16 insertions(+) diff --git a/gcc/common.opt b/gcc/common.opt index 40cab3cb36a..5710e817abe 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2163,6 +21

[SUBREG V4 4/4] LRA: Apply DF_LIVE_SUBREG data

2024-05-12 Thread Juzhe-Zhong
--- gcc/lra-coalesce.cc| 27 +++- gcc/lra-constraints.cc | 109 ++--- gcc/lra-int.h | 4 + gcc/lra-lives.cc | 357 - gcc/lra-remat.cc | 8 +- gcc/lra-spills.cc | 27 +++- gcc/lra.cc | 10 +- 7 files ch

[SUBREG V4 0/4] Add DF_LIVE_SUBREG data and apply to IRA and LRA

2024-05-12 Thread Juzhe-Zhong
subreg liveness tracking in the followup patches. Bootstrap and Regtested on x86-64 no regression. Co-authored-by: Lehua Ding Juzhe-Zhong (4): DF: Add -ftrack-subreg-liveness option DF: Add DF_LIVE_SUBREG problem IRA: Apply DF_LIVE_SUBREG data LRA: Apply DF_LIVE_SUBREG data gcc/M

[SUBREG V3 2/4] DF: Add DF_LIVE_SUBREG problem

2024-05-11 Thread Juzhe-Zhong
This patch add a new DF problem, named DF_LIVE_SUBREG. This problem is extended from the DF_LR problem and support track the subreg liveness of multireg pseudo if these pseudo satisfy the following conditions: 1. the mode size greater than it's REGMODE_NATURAL_SIZE. 2. the reg is used in insns

[SUBREG V3 0/4] Add DF_LIVE_SUBREG data and apply to IRA and LRA

2024-05-11 Thread Juzhe-Zhong
x86-64 no regression. Co-authored-by: Lehua Ding Juzhe-Zhong (4): DF: Add -ftrack-subreg-liveness option DF: Add DF_LIVE_SUBREG problem IRA: Add DF_LIVE_SUBREG problem LRA: Apply DF_LIVE_SUBREG data gcc/Makefile.in | 1 + gcc/common.opt | 4 + gcc/common.opt.

[SUBREG V3 4/4] LRA: Apply DF_LIVE_SUBREG data

2024-05-11 Thread Juzhe-Zhong
This patch apply the DF_LIVE_SUBREG to LRA pass. More changes were made to the LRA than the IRA since the LRA will modify the DF data directly. The main big changes are centered on the lra-lives.cc file. Co-authored-by: Lehua Ding gcc/ChangeLog: * lra-coalesce.cc (update_live_info): App

[SUBREG V3 3/4] IRA: Add DF_LIVE_SUBREG problem

2024-05-11 Thread Juzhe-Zhong
This patch simple replace df_get_live_in to df_get_subreg_live_in and replace df_get_live_out to df_get_subreg_live_out. Co-authored-by: Lehua Ding gcc/ChangeLog: * ira-build.cc (create_bb_allocnos): Apply DF_LIVE_SUBREG data. (create_loop_allocnos): Diito. * ira-color.c

[SUBREG V3 1/4] DF: Add -ftrack-subreg-liveness option

2024-05-11 Thread Juzhe-Zhong
Add new flag -ftrack-subreg-liveness to enable track-subreg-liveness. This flag is enabled at -O3/fast. Co-authored-by: Lehua Ding gcc/ChangeLog: * common.opt: Add -ftrack-subreg-liveness option. * common.opt.urls: Ditto. * doc/invoke.texi: Ditto. * opts.cc: Ditt

[PATCH] RISC-V: Fix infinite compilation of VSETVL PASS

2024-02-05 Thread Juzhe-Zhong
This patch fixes issue reported by Jeff. Testing is running. Ok for trunk if I passed the testing with no regression ? gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): Fix inifinite compilation. (pre_vsetvl::remove_vsetvl_pre_insns): Ditto. --- gcc/conf

[PATCH] RISC-V: Expand VLMAX scalar move in reduction

2024-02-01 Thread Juzhe-Zhong
This patch fixes the following: vsetvli a5,a1,e32,m1,tu,ma sllia4,a5,2 sub a1,a1,a5 vle32.v v2,0(a0) add a0,a0,a4 vadd.vv v1,v2,v1 bne a1,zero,.L3 vsetivlizero,1,e32,m1,ta,ma vmv.s.x v2,zero vse

[PATCH] RISC-V: Allow LICM hoist POLY_INT configuration code sequence

2024-02-01 Thread Juzhe-Zhong
Realize in recent benchmark evaluation (coremark-pro zip-test): vid.v v2 vmv.v.i v5,0 .L9: vle16.v v3,0(a4) vrsub.vxv4,v2,a6 ---> LICM failed to hoist it outside the loop. The root cause is: (insn 56 47 57 4 (set (subreg:DI (reg:HI 220) 0) (re

[PATCH] RISC-V: Remove vsetvl_pre bogus instructions in VSETVL PASS

2024-02-01 Thread Juzhe-Zhong
I realize there is a RTL regression between GCC-14 and GCC-13. https://godbolt.org/z/Ga7K6MqaT GCC-14: (insn 9 13 31 2 (set (reg:DI 15 a5 [138]) (unspec:DI [ (const_int 64 [0x40]) ] UNSPEC_VLMAX)) "/app/example.c":5:15 2566 {vlmax_avldi} (expr_list:REG_EQUI

[PATCH v2] RISC-V: Suppress the vsetvl fusion for conflict successors

2024-02-01 Thread Juzhe-Zhong
Update in v2: Add dump information. This patch fixes the following ineffective vsetvl insertion: #include "riscv_vector.h" void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, size_t cond2) { for (size_t i = 0; i < n; i++) { if (i == cond) { vint8mf8

[PATCH] RISC-V: Disable the vsetvl fusion for conflict successors

2024-02-01 Thread Juzhe-Zhong
This patch fixes the following ineffective vsetvl insertion: #include "riscv_vector.h" void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, size_t cond2) { for (size_t i = 0; i < n; i++) { if (i == cond) { vint8mf8_t v = *(vint8mf8_t*)(in + i + 100);

[PATCH] middle-end: Enhance conditional reduction vectorization by re-association in ifcvt [PR109088]

2024-01-30 Thread Juzhe-Zhong
This patch targets GCC-15. Consider this following case: unsigned int single_loop_with_if_condition (unsigned int *restrict a, unsigned int *restrict b, unsigned int *restrict c, unsigned int loop_size) { unsigned int result = 0; for (unsigned int i = 0; i < lo

[Committed] RISC-V: Fix regression

2024-01-29 Thread Juzhe-Zhong
Due to recent middle-end loop vectorizer changes, these tests have regression and the changes are reasonable. Adapt test to fix the regression. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Adapt test. * gcc.target/riscv/rvv/autovec/binop/shift-rv

[PATCH] RISC-V: Fix VSETLV PASS compile-time issue

2024-01-29 Thread Juzhe-Zhong
The compile time issue was discovered in SPEC 2017 wrf: Use time and -ftime-report to analyze the profile data of SPEC 2017 wrf compilation . Before this patch (Lazy vsetvl): scheduling : 121.89 ( 15%) 0.53 ( 11%) 122.72 ( 15%) 13M ( 1%) machine dep reorg

[Committed] RISC-V: Refine some codes of VSETVL PASS [NFC]

2024-01-26 Thread Juzhe-Zhong
gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): Refine some codes. (pre_vsetvl::emit_vsetvl): Ditto. --- gcc/config/riscv/riscv-vsetvl.cc | 69 +--- 1 file changed, 27 insertions(+), 42 deletions(-) diff --git a

[Committed V2] RISC-V: Fix incorrect LCM delete bug [VSETVL PASS]

2024-01-25 Thread Juzhe-Zhong
This patch fixes the recent noticed bug in RV32 glibc. We incorrectly deleted a vsetvl: ... and a4,a4,a3 vmv.v.i v1,0 ---> Missed vsetvl cause illegal instruction report. vse8.v v1,0(a5) The root cause the laterin in LCM is incorrect.

[PATCH] RISC-V: Fix incorrect LCM delete bug [VSETVL PASS]

2024-01-25 Thread Juzhe-Zhong
This patch fixes the recent noticed bug in RV32 glibc. We incorrectly deleted a vsetvl: ... and a4,a4,a3 vmv.v.i v1,0 ---> Missed vsetvl cause illegal instruction report. vse8.v v1,0(a5) The root cause the laterin in LCM is incorrect.

[PATCH] RISC-V: Add LCM delete block predecessors dump information

2024-01-25 Thread Juzhe-Zhong
While looking into PR113469, I notice the LCM delete a vsetvl incorrectly. This patch add dump information of all predecessors for LCM delete vsetvl block for better debugging. Tested no regression. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (get_all_predecessors): New function.

[Committed] RISC-V: Remove redundant full available computation [NFC]

2024-01-25 Thread Juzhe-Zhong
Notice full available is computed evey round of earliest fusion which is redundant. Actually we only need to compute it once in phase 3. It's NFC patch and tested no regression. Committed. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_vsetvl_def_data): Remove redu

[Committed] RISC-V: Add optim-no-fusion compile option [VSETVL PASS]

2024-01-24 Thread Juzhe-Zhong
This patch adds no fusion compile option to disable phase 2 global fusion. It can help us to analyze the compile-time and debugging. Committed. gcc/ChangeLog: * config/riscv/riscv-opts.h (enum vsetvl_strategy_enum): Add optim-no-fusion option. * config/riscv/riscv-vsetvl.cc (pa

[PATCH] RISC-V: Fix large memory usage of VSETVL PASS [PR113495]

2024-01-23 Thread Juzhe-Zhong
SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS that is, VSETVL PASS consume over 33 GB memory which make use impossible to compile SPEC 2017 wrf in a laptop. The root cause is wasting-memory variables: unsigned num_exprs = num_bbs * num_regs; sbitmap *avl_def_loc = sbitmap

[PATCH] RISC-V: Fix regressions due to 86de9b66480b710202a2898cf513db105d8c432f

2024-01-22 Thread Juzhe-Zhong
This patch fixes the recent regression: FAIL: gcc.dg/torture/float32-tg-2.c -O1 (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/float32-tg-2.c -O1 (test for excess errors) FAIL: gcc.dg/torture/float32-tg-2.c -O2 (internal compiler error: in reg_or_sub

[PATCH] RISC-V: Lower vmv.v.x (avl = 1) into vmv.s.x

2024-01-21 Thread Juzhe-Zhong
Notice there is a AI benchmark, GCC vs Clang has 3% performance drop. It's because Clang/LLVM has a simplification transform vmv.v.x (avl = 1) into vmv.s.x. Since vmv.s.x has more flexible vsetvl demand than vmv.v.x that can allow us to have better chances to fuse vsetvl. Consider this followi

[PATCH] RISC-V: Fix vfirst/vmsbf/vmsif/vmsof ratio attributes

2024-01-21 Thread Juzhe-Zhong
vfirst/vmsbf/vmsif/vmsof instructions are supposed to demand ratio instead of demanding sew_lmul. But my previous typo makes VSETVL PASS miss honor the risc-v v spec. Consider this following simple case: int foo4 (void * in, void * out) { vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4); v = __r

[Committed] RISC-V: Suppress warning

2024-01-19 Thread Juzhe-Zhong
../../gcc/config/riscv/riscv.cc: In function 'void riscv_init_cumulative_args(CUMULATIVE_ARGS*, tree, rtx, tree, int)': ../../gcc/config/riscv/riscv.cc:4879:34: error: unused parameter 'fndecl' [-Werror=unused-parameter] 4879 | tree fndecl, |

[PATCH V2] RISC-V: Fix RVV_VLMAX

2024-01-19 Thread Juzhe-Zhong
This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode, X0_REGNUM) every time we call RVV_VLMAX, that is, we are always generating garbage and redundant (reg:DI 0 zero) rtx. After this patch fix, the memo

[PATCH] RISC-V: Fix RVV_VLMAX

2024-01-19 Thread Juzhe-Zhong
This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode, X0_REGNUM) every time we call RVV_VLMAX, that is, we are always generating garbage and redundant (reg:DI 0 zero) rtx. After this patch fix, the memo

[PATCH] RISC-V: Support vi variant for vec_cmp

2024-01-18 Thread Juzhe-Zhong
While running various benchmarks, I notice we miss vi variant support for integer comparison. That is, we can vectorize code into vadd.vi but we can't vectorize into vmseq.vi. Consider this following case: void foo (int n, int **__restrict a) { int b; int c; int d; for (b = 0; b < n; b+

[PATCH v2] test regression fix: Add !vect128 for variable length targets of bb-slp-subgroups-3.c

2024-01-17 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-subgroups-3.c: Add !vect128. --- gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3

[Committed V3] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Juzhe-Zhong
V3: Rebase to trunk and commit it. This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible check for conflict vsetvl fusion. Buggy assembler before this patch: .L69: vsetvli a5,s1,e8,mf4,ta,ma -> buggy vsetvl vsetivlizero,8,e8,mf2,ta,

[PATCH V2] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Juzhe-Zhong
This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible check for conflict vsetvl fusion. Buggy assembler before this patch: .L69: vsetvli a5,s1,e8,mf4,ta,ma -> buggy vsetvl vsetivlizero,8,e8,mf2,ta,ma vmv.v.i v1,0 vse8

[PATCH] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Juzhe-Zhong
This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible check for conflict vsetvl fusion. Buggy assembler before this patch: .L69: vsetvli a5,s1,e8,mf4,ta,ma -> buggy vsetvl vsetivlizero,8,e8,mf2,ta,ma vmv.v.i v1,0 vse8

[PATCH v2] test regression fix: Add vect128 for bb-slp-43.c

2024-01-16 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-43.c: Add vect128. --- gcc/testsuite/gcc.dg/vect/bb-slp-43.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-43.c b/gcc/testsuite/gcc.dg/vect/bb-slp-43.c index dad2d24262d..8aedb06bf72 100

[PATCH] test regression fix: Remove xfail for variable length targets of bb-slp-subgroups-3.c

2024-01-15 Thread Juzhe-Zhong
Notice there is a regression recently: XPASS: gcc.dg/vect/bb-slp-subgroups-3.c -flto -ffat-lto-objects scan-tree-dump-times slp2 "optimized: basic block" 2 XPASS: gcc.dg/vect/bb-slp-subgroups-3.c scan-tree-dump-times slp2 "optimized: basic block" 2 Checked on both ARM SVE an RVV: https://godbo

[PATCH] test regression fix: Remove xfail for variable length targets

2024-01-15 Thread Juzhe-Zhong
Recently notice there is a XPASS in RISC-V: XPASS: gcc.dg/vect/bb-slp-43.c -flto -ffat-lto-objects scan-tree-dump-not slp2 "vector operands from scalars" XPASS: gcc.dg/vect/bb-slp-43.c scan-tree-dump-not slp2 "vector operands from scalars" And checked both ARM SVE and RVV: https://godbolt.org/

[PATCH] RISC-V: Report Sorry when users enable RVV in big-endian mode [PR113404]

2024-01-15 Thread Juzhe-Zhong
As PR113404 mentioned: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113404 We have ICE when we enable RVV in big-endian mode: during RTL pass: expand a-float-point-dynamic-frm-66.i:2:14: internal compiler error: in to_constant, at poly-int.h:588 0xab4c2c poly_int<2u, unsigned short>::to_constant

[Committed V2] RISC-V: Fix regression (GCC-14 compare with GCC-13.2) of SHA256 from coremark-pro

2024-01-15 Thread Juzhe-Zhong
This patch fixes -70% performance drop from GCC-13.2 to GCC-14 with -march=rv64gcv in real hardware. The root cause is incorrect cost model cause inefficient vectorization which makes us performance drop significantly. So this patch does: 1. Adjust vector to scalar cost by introducing v to sca

[Committed V3] RISC-V: Adjust loop len by costing 1 when NITER < VF

2024-01-15 Thread Juzhe-Zhong
Rebase in v3: Rebase to the trunk and commit it as it's approved by Robin. Update in v2: Add dynmaic lmul test. This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14) GCC 13.2.0: lui a5,%hi(a) li a4,19 sb a4,%lo(a)(a5) li a0,0

[Committed] RISC-V: Add optimized dump check of VLS reduc tests

2024-01-15 Thread Juzhe-Zhong
Add more dump check to robostify the tests. Committed. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/reduc-1.c: Add dump check. * gcc.target/riscv/rvv/autovec/vls/reduc-10.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/reduc-11.c: Ditto. * gcc.target/r

[Committed] RISC-V: Fix attributes bug configuration of ternary instructions

2024-01-14 Thread Juzhe-Zhong
This patch fixes the following FAILs: Running target riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax FAIL: gcc.c-torture/execute/pr68532.c -O0 execution test FAIL: gcc.c-torture/execute/pr68532.c -O1 execution test FAIL: gcc.c-torture/execut

[PATCH] RISC-V: Fix regression (GCC-14 compare with GCC-13.2) of SHA256 from coremark-pro

2024-01-14 Thread Juzhe-Zhong
This patch fixes -70% performance drop from GCC-13.2 to GCC-14 with -march=rv64gcv in real hardware. The root cause is incorrect cost model cause inefficient vectorization which makes us performance drop significantly. So this patch does: 1. Adjust vector to scalar cost by introducing v to sca

[PATCH] RISC-V: Adjust loop len by costing 1 when NITER < VF

2024-01-14 Thread Juzhe-Zhong
Update in v2: Add dynmaic lmul test. This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14) GCC 13.2.0: lui a5,%hi(a) li a4,19 sb a4,%lo(a)(a5) li a0,0 ret Trunk GCC: vsetvli a5,zero,e8,mf2,ta,ma li

[PATCH] RISC-V: Adjust loop len by costing 1 when NITER < VF [GCC 14 regression]

2024-01-12 Thread Juzhe-Zhong
This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14) GCC 13.2.0: lui a5,%hi(a) li a4,19 sb a4,%lo(a)(a5) li a0,0 ret Trunk GCC: vsetvli a5,zero,e8,mf2,ta,ma li a4,-32768 vid.v v1

[PATCH V3] RISC-V: Adjust scalar_to_vec cost

2024-01-12 Thread Juzhe-Zhong
1. Introduce vector regmove new tune info. 2. Adjust scalar_to_vec cost in add_stmt_cost. We will get optimal codegen after this patch with -march=rv64gcv_zvl256b: lui a5,%hi(a) li a4,19 sb a4,%lo(a)(a5) li a0,0 ret Tested on both RV32/R

[Committed] RISC-V: Enhance a testcase

2024-01-11 Thread Juzhe-Zhong
This test should pass no matter how we adjust cost model. Remove -fno-vect-cost-model. Committed. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/fold-min-poly.c: Remove -fno-vect-cost-model --- gcc/testsuite/gcc.target/riscv/rvv/autovec/fold-min-poly.c | 2 +- 1 file changed

[PATCH V2] RISC-V: Adjust scalar_to_vec cost accurately

2024-01-11 Thread Juzhe-Zhong
1. This patch set scalar_to_vec cost as 2 instead 1 since scalar move instruction is slightly more costly than normal rvv instructions (e.g. vadd.vv). 2. Adjust scalar_to_vec cost accurately according to the splat value, for example, a value like 32872, needs 2 more scalar instructions:

[PATCH] RISC-V: Increase scalar_to_vec_cost from 1 to 3

2024-01-11 Thread Juzhe-Zhong
This patch fixes the following inefficient vectorized codes: vsetvli a5,zero,e8,mf2,ta,ma li a2,17 vid.v v1 li a4,-32768 vsetvli zero,zero,e16,m1,ta,ma addiw a4,a4,104 vmv.v.i v3,15 lui a1,%hi(a) li a0,1

[PATCH] RISC-V: VLA preempts VLS on unknown NITERS loop

2024-01-10 Thread Juzhe-Zhong
This patch fixes the known issues on SLP cases: ble a2,zero,.L11 addiw t1,a2,-1 li a5,15 bleut1,a5,.L9 srliw a7,t1,4 sllia7,a7,7 lui t3,%hi(.LANCHOR0) lui a6,%hi(.LANCHOR0+128) addit3,t3,%lo(.L

[PATCH V2] RISC-V: Switch RVV cost model.

2024-01-10 Thread Juzhe-Zhong
This patch is preparing patch for the following cost model tweak. Since we don't have vector cost model in default tune info (rocket), we set the cost model default as generic cost model by default. The reason we want to switch to generic vector cost model is the default cost model generates infe

[PATCH] RISC-V: Switch RVV cost model to generic vector cost model

2024-01-10 Thread Juzhe-Zhong
This patch is preparing patch for the following cost model tweak. Since we don't have vector cost model in default tune info (rocket), we set the cost model default as generic cost model by default. The reason we want to switch to generic vector cost model is the default cost model generates infe

[PATCH] RISC-V: Refine unsigned avg_floor/avg_ceil

2024-01-09 Thread Juzhe-Zhong
This patch is inspired by LLVM patches: https://github.com/llvm/llvm-project/pull/76550 https://github.com/llvm/llvm-project/pull/77473 Use vaaddu for AVG vectorization. Before this patch: vsetivlizero,8,e8,mf2,ta,ma vle8.v v3,0(a1) vle8.v v2,0(a2) vwadd

[PATCH V2] RISC-V: Minor tweak dynamic cost model

2024-01-09 Thread Juzhe-Zhong
v2 update: Robostify tests. While working on cost model, I notice one case that dynamic lmul cost doesn't work well. Before this patch: foo: lui a4,%hi(.LANCHOR0) li a0,1953 li a1,63 addia4,a4,%lo(.LANCHOR0) li a3,64 vsetvli

[PATCH] RISC-V: Minor tweak dynamic cost model

2024-01-09 Thread Juzhe-Zhong
While working on cost model, I notice one case that dynamic lmul cost doesn't work well. Before this patch: foo: lui a4,%hi(.LANCHOR0) li a0,1953 li a1,63 addia4,a4,%lo(.LANCHOR0) li a3,64 vsetvli a2,zero,e32,mf2,ta,ma

[Committed] RISC-V: Robostify dynamic lmul test

2024-01-09 Thread Juzhe-Zhong
While working on refining the cost model, I notice this test will generate unexpected scalar xor instructions if we don't tune cost model carefully. Add more assembler to avoid future regression. Committed. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: A

[Committed] RISC-V: Fix comments of segment load/store intrinsic

2024-01-08 Thread Juzhe-Zhong
We have supported segment load/store intrinsics. Committed as it is obvious. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-functions.def (vleff): Move comments. (vundefined): Ditto. --- gcc/config/riscv/riscv-vector-builtins-functions.def | 4 ++-- 1 file changed, 2 inse

[Committed] RISC-V: Fix comments of segment load/store intrinsic[NFC]

2024-01-08 Thread Juzhe-Zhong
We have supported segment load/store intrinsics. Committed as it is obvious. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-functions.def (vleff): Move comments to real place. (vcreate): Ditto. --- gcc/config/riscv/riscv-vector-builtins-functions.def | 4 +--- 1 file chan

[PATCH] RISC-V: Fix loop invariant check

2024-01-08 Thread Juzhe-Zhong
As Robin suggested, remove gimple_uid check which is sufficient for our need. Tested on both RV32/RV64 no regression, ok for trunk ? gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (loop_invariant_op_p): Fix loop invariant check. --- gcc/config/riscv/riscv-vector-costs.cc | 2 +-

[Committed] RISC-V: Use MAX instead of std::max [VSETVL PASS]

2024-01-06 Thread Juzhe-Zhong
Obvious fix, Committed. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc: replace std::max by MAX. --- gcc/config/riscv/riscv-vsetvl.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 7d748edc

[Committed] RISC-V: Update MAX_SEW for available vsevl info[VSETVL PASS]

2024-01-05 Thread Juzhe-Zhong
This patch fixes a bug of VSETVL PASS in this following situation: Ignore curr info since prev info available with it: prev_info: VALID (insn 8, bb 2) Demand fields: demand_ratio_and_ge_sew demand_avl SEW=16, VLMUL=mf4, RATIO=64, MAX_SEW=64 TAIL_POLICY=agnostic, M

[Committed V2] RISC-V: Teach liveness computation loop invariant shift amount

2024-01-05 Thread Juzhe-Zhong
1). We not only have vashl_optab,vashr_optab,vlshr_optab which vectorize shift with vector shift amount, that is, vectorization of 'a[i] >> x[i]', the shift amount is loop variant. 2). But also, we have ashl_optab, ashr_optab, lshr_optab which can vectorize shift with scalar shift amount, that is

[Committed V2] RISC-V: Allow simplification non-vlmax with len = NUNITS reg to reg move

2024-01-05 Thread Juzhe-Zhong
V2: Address comments from Robin. While working on fixing a bug, I notice this following code has redundant move: #include "riscv_vector.h" void f (float x, float y, void *out) { float f[4] = { x, x, x, y }; vfloat32m1_t v = __riscv_vle32_v_f32m1 (f, 4); __riscv_vse32_v_f32m1 (out, v, 4); }

[PATCH] RISC-V: Allow simplification non-vlmax with len = NUNITS reg to reg move

2024-01-04 Thread Juzhe-Zhong
While working on fixing a bug, I notice this following code has redundant move: #include "riscv_vector.h" void f (float x, float y, void *out) { float f[4] = { x, x, x, y }; vfloat32m1_t v = __riscv_vle32_v_f32m1 (f, 4); __riscv_vse32_v_f32m1 (out, v, 4); } Before this patch: f: vs

[PATCH] RISC-V: Teach liveness computation loop invariant shift amount[Dynamic LMUL]

2024-01-04 Thread Juzhe-Zhong
1). We not only have vashl_optab,vashr_optab,vlshr_optab which vectorize shift with vector shift amount, that is, vectorization of 'a[i] >> x[i]', the shift amount is loop variant. 2). But also, we have ashl_optab, ashr_optab, lshr_optab which can vectorize shift with scalar shift amount, that is

[Committed V3] RISC-V: Make liveness estimation be aware of .vi variant

2024-01-04 Thread Juzhe-Zhong
Consider this following case: void f (int *restrict a, int *restrict b, int *restrict c, int *restrict d, int n) { for (int i = 0; i < n; i++) { int tmp = b[i] + 15; int tmp2 = tmp + b[i]; c[i] = tmp2 + b[i]; d[i] = tmp + tmp2 + b[i]; } } Current dynamic LMUL cos

[Committed V2] RISC-V: Make liveness estimation be aware of .vi variant

2024-01-04 Thread Juzhe-Zhong
Consider this following case: void f (int *restrict a, int *restrict b, int *restrict c, int *restrict d, int n) { for (int i = 0; i < n; i++) { int tmp = b[i] + 15; int tmp2 = tmp + b[i]; c[i] = tmp2 + b[i]; d[i] = tmp + tmp2 + b[i]; } } Current dynamic LMUL cos

[PATCH] RISC-V: Teach liveness estimation be aware of .vi variant

2024-01-04 Thread Juzhe-Zhong
Consider this following case: void f (int *restrict a, int *restrict b, int *restrict c, int *restrict d, int n) { for (int i = 0; i < n; i++) { int tmp = b[i] + 15; int tmp2 = tmp + b[i]; c[i] = tmp2 + b[i]; d[i] = tmp + tmp2 + b[i]; } } Current dynamic LMUL cos

[Committed] RISC-V: Refine LMUL computation for MASK_LEN_LOAD/MASK_LEN_STORE IFN

2024-01-03 Thread Juzhe-Zhong
Notice a case has "Maximum lmul = 16" which is incorrect. Correct LMUL estimation for MASK_LEN_LOAD/MASK_LEN_STORE. Committed. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (variable_vectorized_p): New function. (compute_nregs_for_mode): Refine LMUL. (max_number_of

[Committed] RISC-V: Fix indent

2024-01-03 Thread Juzhe-Zhong
Fix indent of some codes to make them 8 spaces align. Committed. gcc/ChangeLog: * config/riscv/vector.md: Fix indent. --- gcc/config/riscv/vector.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md in

[Committed V3] RISC-V: Fix bug of earliest fusion for infinite loop[VSETVL PASS]

2024-01-03 Thread Juzhe-Zhong
As PR113206 and PR113209, the bugs happens on the following situation: li a4,32 ... vsetvli zero,a4,e8,m8,ta,ma ... slliw a4,a3,24 sraiw a4,a4,24 bge a3,a1,.L8 sb a4,%lo(e)(a0) vsetvli zero,a4,e8,m8,ta,ma --

[PATCH V2] RISC-V: Fix bug of earliest fusion for infinite loop[VSETVL PASS]

2024-01-03 Thread Juzhe-Zhong
As PR113206 and PR113209, the bugs happens on the following situation: li a4,32 ... vsetvli zero,a4,e8,m8,ta,ma ... slliw a4,a3,24 sraiw a4,a4,24 bge a3,a1,.L8 sb a4,%lo(e)(a0) vsetvli zero,a4,e8,m8,ta,ma --

[PATCH] RISC-V: Fix bug of earliest fusion for infinite loop[VSETVL PASS]

2024-01-03 Thread Juzhe-Zhong
As PR113206, the bugs happens on the following situation: li a4,32 ... vsetvli zero,a4,e8,m8,ta,ma ... slliw a4,a3,24 sraiw a4,a4,24 bge a3,a1,.L8 sb a4,%lo(e)(a0) vsetvli zero,a4,e8,m8,ta,ma --> a4 is pollu

[Committed] RISC-V: Add simplification of dummy len and dummy mask COND_LEN_xxx pattern

2024-01-01 Thread Juzhe-Zhong
In https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d1eacedc6d9ba9f5522f2c8d49ccfdf7939ad72d I optimize COND_LEN_xxx pattern with dummy len and dummy mask with too simply solution which causes redundant vsetvli in the following case: vsetvli a5,a2,e8,m1,ta,ma vle32.v v8,0(a0)

[PATCH] RISC-V: Make liveness be aware of rgroup number of LENS[dynamic LMUL]

2024-01-01 Thread Juzhe-Zhong
This patch fixes the following situation: vl4re16.v v12,0(a5) ... vl4re16.v v16,0(a3) vs4r.v v12,0(a5) ... vl4re16.v v4,0(a0) vs4r.v v16,0(a3) ... vsetvli a3,zero,e16,m4,ta,ma ... vmv.v.x v8,t6 vmsgeu.vv v2,v16,v8 vsub.vv v16,v16,v8 vs4r.v v16,0(a5) ... vs4r.v v4,0(a0) v

[Committed] RISC-V: Declare STMT_VINFO_TYPE (...) as local variable

2024-01-01 Thread Juzhe-Zhong
Committed. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc: Move STMT_VINFO_TYPE (...) to local. --- gcc/config/riscv/riscv-vector-costs.cc | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-

[Committed] RISC-V: Robostify testcase pr113112-1.c

2023-12-28 Thread Juzhe-Zhong
The redudant dump check is fragile and easily changed, not necessary. Tested on both RV32/RV64 no regression. Remove it and committed. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: Remove redundant checks. --- gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr

[PATCH] RISC-V: Count pointer type SSA into RVV regs liveness for dynamic LMUL cost model

2023-12-28 Thread Juzhe-Zhong
This patch fixes the following choosing unexpected big LMUL which cause register spillings. Before this patch, choosing LMUL = 4: addisp,sp,-160 addiw t1,a2,-1 li a5,7 bleut1,a5,.L16 vsetivlizero,8,e64,m4,ta,ma vmv.v.x v4,a0

[Committed] RISC-V: Make dynamic LMUL cost model more accurate for conversion codes

2023-12-27 Thread Juzhe-Zhong
Notice current dynamic LMUL is not accurate for conversion codes. Refine for it, there is current case is changed from choosing LMUL = 4 into LMUL = 8. Tested no regression, committed. Before this patch (LMUL = 4): After this patch (LMUL = 8): lw a7,56(sp)

[Committed] RISC-V: Make known NITERS loop be aware of dynamic lmul cost model liveness information

2023-12-27 Thread Juzhe-Zhong
Consider this following case: int f[12][100]; void bad1(int v1, int v2) { for (int r = 0; r < 100; r += 4) { int i = r + 1; f[0][r] = f[1][r] * (f[2][r]) - f[1][i] * (f[2][i]); f[0][i] = f[1][r] * (f[2][i]) + f[1][i] * (f[2][r]); f[0][r+2] = f[1][r+2] * (f[2][r+2]) -

[PATCH V2] RISC-V: Disallow transformation into VLMAX AVL for cond_len_xxx when length is in range [0, 31]

2023-12-26 Thread Juzhe-Zhong
Notice we have this following situation: vsetivlizero,4,e32,m1,ta,ma vlseg4e32.v v4,(a5) vlseg4e32.v v12,(a3) vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since VLMAX AVL = 4 when it is fixed-vlmax vfadd.vfv3,v13,f

[PATCH] RISC-V: Disallow transformation into VLMAX AVL for cond_len_xxx when length is in range [0, 31]

2023-12-26 Thread Juzhe-Zhong
Notice we have this following situation: vsetivlizero,4,e32,m1,ta,ma vlseg4e32.v v4,(a5) vlseg4e32.v v12,(a3) vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since VLMAX AVL = 4 when it is fixed-vlmax vfadd.vfv3,v13,f

[Committed] RISC-V: Fix typo

2023-12-26 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: Fix typo. --- .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.

[Committed] RISC-V: Some minior tweak on dynamic LMUL cost model

2023-12-26 Thread Juzhe-Zhong
Tweak some codes of dynamic LMUL cost model to make computation more predictable and accurate. Tested on both RV32 and RV64 no regression. Committed. PR target/113112 gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (compute_estimated_lmul): Tweak LMUL estimation.

[PATCH] RISC-V: Move RVV V_REGS liveness computation into analyze_loop_vinfo

2023-12-25 Thread Juzhe-Zhong
Currently, we compute RVV V_REGS liveness during better_main_loop_than_p which is not appropriate time to do that since we for example, when have the codes will finally pick LMUL = 8 vectorization factor, we compute liveness for LMUL = 8 multiple times which are redundant. Since we have leverage

[Committed] RISC-V: Add one more ASM check in PR113112-1.c

2023-12-24 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: Add one more ASM check. --- gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c b/gcc/testsuite/

[Committed] RISC-V: Make PHI initial value occupy live V_REG in dynamic LMUL cost model analysis

2023-12-22 Thread Juzhe-Zhong
Consider this following case: foo: ble a0,zero,.L11 lui a2,%hi(.LANCHOR0) addisp,sp,-128 addia2,a2,%lo(.LANCHOR0) mv a1,a0 vsetvli a6,zero,e32,m8,ta,ma vid.v v8 vs8r.v v8,0(sp) ---> spill .L

[PATCH] RISC-V: Make PHI initial value occupy live V_REG in dynamic LMUL cost model analysis

2023-12-22 Thread Juzhe-Zhong
Consider this following case: foo: ble a0,zero,.L11 lui a2,%hi(.LANCHOR0) addisp,sp,-128 addia2,a2,%lo(.LANCHOR0) mv a1,a0 vsetvli a6,zero,e32,m8,ta,ma vid.v v8 vs8r.v v8,0(sp) ---> spill .L

[Committed] RISC-V: Add dynamic LMUL test for x264

2023-12-21 Thread Juzhe-Zhong
When working on evaluating x264 performance, I notice the best LMUL for such case with -march=rv64gcv is LMUL = 2 LMUL = 1: x264_pixel_8x8: add a4,a1,a2 addia6,a0,16 vsetivlizero,4,e8,mf4,ta,ma add a5,a4,a2 vle8.v v12,0(a6) vle

[Committed] RISC-V: Fix ICE of moving SUBREG of vector mode to DImode scalar register on RV32 system.

2023-12-20 Thread Juzhe-Zhong
This patch fixes following ICE on full coverage testing of RV32. Running target riscv-sim/-march=rv32gc_zve32f/-mabi=ilp32d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic FAIL: gcc.c-torture/compile/930120-1.c -O2 (internal compiler error: in emit_move_insn, at expr.cc:4606) FAIL: gcc.c-t

[PATCH] RISC-V: Optimize SELECT_VL codegen when length is known as smaller than VF

2023-12-19 Thread Juzhe-Zhong
While trying to fix bugs of PR113097, notice this following situation we generate redundant vsetvli _255 = SELECT_VL (3, POLY_INT_CST [4, 4]); COND_LEN (..., _255) Before this patch: vsetivli a5, 3... ... vadd.vv (use a5) After this patch: ... vadd.vv (use AVL = 3) The reason we can do this i

[PATCH] RISC-V: Fix bug of VSETVL fusion

2023-12-19 Thread Juzhe-Zhong
This patch fixes bugs in the fusion of this following case: li a5,-1 vmv.s.x v0,a5 -> demand any non-zero AVL vsetvli a5, ... Incorrect fusion after VSETVL PASS: li a5,-1 vsetvli a5... vmv.s.x v0, a5 --> a5 is modified as incorrect value. We disallow this incorrect fusion above. Full coverage

[PATCH] RISC-V: Fix FAIL of bb-slp-cond-1.c for RVV

2023-12-19 Thread Juzhe-Zhong
Due to recent VLSmode changes (Change for fixing ICE and run-time FAIL). The dump check is same as ARM SVE now. So adapt test for RISC-V. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-cond-1.c: Adapt for RISC-V. --- gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c | 4 ++-- 1 file changed, 2

[PATCH] Regression FIX: Remove vect_variable_length XFAIL from some tests

2023-12-19 Thread Juzhe-Zhong
Hi, this patch fixes these following regression FAILs on RVV: XPASS: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;" XPASS: gcc.dg/vect/bb-slp-43.c -flto -ffat-lto-objects scan-tree-dump-not slp2 "vector operands from scalars" XPASS: gcc.dg/vect/bb-slp-43.c scan-tree-dump-not sl

[Committed] RISC-V: Refine some codes of expand_const_vector [NFC]

2023-12-19 Thread Juzhe-Zhong
gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Use builder.inner_mode (). --- gcc/config/riscv/riscv-v.cc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index d1eb7a0a9a5..486f5deb296 1

[Committed] RISC-V: Fix FAIL of dynamic-lmul2-7.c

2023-12-18 Thread Juzhe-Zhong
Fix this FAIL: FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c scan-tree-dump-times vect "Maximum lmul = 2" 1 gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Adapt test. --- gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c | 2 +- 1

[Committed] RISC-V: Remove 256/512/1024 VLS vectors

2023-12-18 Thread Juzhe-Zhong
Since https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=2e7abd09621a4401d44f4513adf126bce4b4828b we only allow VLSmodes with size <= TARGET_MIN_VLEN * TARGET_MAX_LMUL. So when -march=rv64gcv default LMUL = 1, we don't have VLS modes of 256/512/1024 vectors. Disable them in vect test which fixes the

  1   2   3   4   5   6   7   8   9   10   >