[Bug target/120439] RVV: wrong tail/mask-policy when source and destination overlap with different EEW

2025-05-27 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120439 --- Comment #5 from JuzheZhong --- (In reply to Andrew Waterman from comment #4) > Yes. Thanks. The GCC codegen is correct here. Am I right ?

[Bug target/120439] RVV: wrong tail/mask-policy when source and destination overlap with different EEW

2025-05-27 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120439 --- Comment #3 from JuzheZhong --- (In reply to Andrew Waterman from comment #2) > > You are saying when vd and vs2 is overlaping in vnsrl, we can't allow > > undisturbed policy ? CC RISC-V folks ing. > > No. The instruction is allowed, and i

[Bug target/117947] [14/15/16 Regression] GCC miscompile rvv intrinsics at `-O2` and `-O3`, use `vlenb` after an inappropriate `vsetvli`

2025-05-27 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117947 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment #2

[Bug target/120439] RVV: wrong tail/mask-policy when source and destination overlap with different EEW

2025-05-27 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120439 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment #1

[Bug c++/120362] [GCC-15.1] Illegal Insn when run spec2017 511 ref size

2025-05-20 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120362 --- Comment #10 from JuzheZhong --- (In reply to Robin Dapp from comment #9) > > No. vlre should not depend on vtype. It should be hardware bug. > > Are you sure about that? vmv1r also doesn't depend on a specific vtype, > each one is OK, but

[Bug c++/120362] [GCC-15.1] Illegal Insn when run spec2017 511 ref size

2025-05-20 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120362 --- Comment #8 from JuzheZhong --- (In reply to Robin Dapp from comment #6) > (In reply to Kito Cheng from comment #5) > > Oh, vsetvli/vill issue should only appeared for whole reg move not whole reg > > load store > > On the Banana Pi I get a

[Bug target/118945] RISC-V: VSETL pass: Don't promote Vectors ops from Tail agnostic to Tail Undisturbed

2025-02-20 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118945 --- Comment #9 from JuzheZhong --- (In reply to Andrew Waterman from comment #8) > > In fact, I'd be rather surprised to see anything preferring tail > > undisturbed. > > Right. To be precise, microarchitectures without register renaming > a

[Bug target/118945] RISC-V: VSETL pass: Don't promote Vectors ops from Tail agnostic to Tail Undisturbed

2025-02-20 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118945 --- Comment #6 from JuzheZhong --- (In reply to Jeffrey A. Law from comment #5) > This doesn't seem like an ABI issue (WRT c#2), it's just question of what > uarchs prefer from a performance standpoint. > > With that in mind I'd tend to think t

[Bug target/118945] RISC-V: VSETL pass: Don't promote Vectors ops from Tail agnostic to Tail Undisturbed

2025-02-19 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118945 --- Comment #2 from JuzheZhong --- I have thought about this long time ago while I am working on supporting RVV on upstream GCC. https://github.com/riscv-non-isa/riscv-toolchain-conventions/issues/37 I suggested we should have -mprefer-agnosti

[Bug target/118057] RISC-V: Can't vectorize load and store with zvl128b

2024-12-16 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057 --- Comment #3 from JuzheZhong --- (In reply to Robin Dapp from comment #2) > I think depending on the performance of strided loads/stores this can be > profitable to vectorize. Looks like we need loop versioning to account for > the possible a

[Bug target/118057] RISC-V: Can't vectorize load and store with zvl128b

2024-12-16 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057 --- Comment #1 from JuzheZhong --- https://godbolt.org/z/q1E6dn6T9 Try -fno-vect-cost-model, it can be vectorized. I think both Clang and GCC (with no cost vect model) vectorized code can't give better performance in a wide-issue OOO superscal

[Bug target/118019] RISC-V: Performance regression in hottest function of X264

2024-12-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019 --- Comment #8 from JuzheZhong --- (In reply to Robin Dapp from comment #7) > > The problem is GCC-15 has performance regression compare to GCC-14 on both > > strict align and we should fix it, we can't specify use no strict align in > > GCC-15

[Bug target/118019] RISC-V: Performance regression in hottest function of X264

2024-12-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019 --- Comment #6 from JuzheZhong --- (In reply to Robin Dapp from comment #5) > According to Li Pan's results this is "just" vector strict align again? > We should be vectorizing the first loop, in particular after the > SLP-grouping changes. > >

[Bug target/118019] RISC-V: Performance regression in hottest function of X264

2024-12-12 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019 --- Comment #2 from JuzheZhong --- (In reply to Vineet Gupta from comment #1) > How exactly are you building it ? -march=rv64gcv_zvl512b -mabi=lp64d -mrvv-vector-bits=zvl -mrvv-max-lmul=m2

[Bug c/118019] New: RISC-V: Performance regression in hottest function of X264

2024-12-12 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019 Bug ID: 118019 Summary: RISC-V: Performance regression in hottest function of X264 Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Pr

[Bug target/117974] RISC-V: VSETVL hoisting across branch

2024-12-10 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117974 --- Comment #7 from JuzheZhong --- (In reply to Vineet Gupta from comment #4) > (In reply to JuzheZhong from comment #2) > > We need to split all insns since some of them are not the ultimate RVV > > instruction pattern that depend on VL/VTYPE.

[Bug target/117974] RISC-V: VSETVL hoisting across branch

2024-12-09 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117974 --- Comment #3 from JuzheZhong --- I can optimize it if I find the time. (Currently, I am busy with other stuff).

[Bug target/117974] RISC-V: VSETVL hoisting across branch

2024-12-09 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117974 --- Comment #2 from JuzheZhong --- We need to split all insns since some of them are not the ultimate RVV instruction pattern that depend on VL/VTYPE. And I don't think the vsetvli should be keep close VLE, instead, They are redundant, I think

[Bug c/117804] New: RISC-V: Worse codegen in mc_chroma of x264

2024-11-27 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117804 Bug ID: 117804 Summary: RISC-V: Worse codegen in mc_chroma of x264 Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c

[Bug target/117769] RISC-V: Worse codegen in x264_pixel_satd_8x4

2024-11-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117769 JuzheZhong changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2024-11-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 117769, which changed state. Bug 117769 Summary: RISC-V: Worse codegen in x264_pixel_satd_8x4 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117769 What|Removed |Added

[Bug target/117769] RISC-V: Worse codegen in x264_pixel_satd_8x4

2024-11-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117769 --- Comment #2 from JuzheZhong --- Ok. I see it is not an issue now. When we enable -mno-vect-strict-align: https://godbolt.org/z/MzqzPTcc6 We have same codegen as ARM SVE now: x264_pixel_satd_8x4(unsigned char*, int, unsigned char*, int):

[Bug c/117769] New: RISC-V: Worse codegen in x264_pixel_satd_8x4

2024-11-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117769 Bug ID: 117769 Summary: RISC-V: Worse codegen in x264_pixel_satd_8x4 Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-11-21 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #4 from JuzheZhong --- (In reply to Robin Dapp from comment #3) > First, pixel_sad_4x4 is not very hot, 8x8 and 16x16 are. > > Second, we are vectorizing this, but with -mno-vector-strict-align. > > IMHO we don't need to synthesize

[Bug tree-optimization/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-11-20 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #1 from JuzheZhong --- OK. I see we are lacking ssadd/usad pattern (SAD_EXPR): Compute the sum of absolute differences of two signed/unsigned elements. Operand 1 and operand 2 are of the same mode. Their absolute difference, which i

[Bug c/117722] New: RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-11-20 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 Bug ID: 117722 Summary: RISC-V: Failed to vectorize x264_pixel_sad_4x4 Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c

[Bug tree-optimization/116578] vectorizer SLP transition issues / dependences

2024-09-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116578 Bug 116578 depends on bug 116691, which changed state. Bug 116691 Summary: RISC-V: Unexpected auto-vectorization codegen in simple vectorization https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116691 What|Removed

[Bug target/116691] RISC-V: Unexpected auto-vectorization codegen in simple vectorization

2024-09-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116691 JuzheZhong changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug tree-optimization/116573] [15 Regression] Recent SLP work appears to generate significantly worse code on RISC-V

2024-09-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116573 --- Comment #10 from JuzheZhong --- (In reply to Richard Biener from comment #9) > So with the patch I see tons of "regressions" > (https://github.com/ewlu/gcc-precommit-ci/issues/2248#issuecomment- > 2355417578) like for example for > gcc.targe

[Bug tree-optimization/116573] [15 Regression] Recent SLP work appears to generate significantly worse code on RISC-V

2024-09-12 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116573 --- Comment #4 from JuzheZhong --- (In reply to Richard Biener from comment #3) > So when investigating "future" fallout I've seen similar differences for > gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c for example with the > GIMPLE diffe

[Bug c/116691] New: RISC-V: Unexpected auto-vectorization codegen in simple vectorization

2024-09-11 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116691 Bug ID: 116691 Summary: RISC-V: Unexpected auto-vectorization codegen in simple vectorization Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal

[Bug target/116685] RISC-V: missed optimization on vector dot products

2024-09-11 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116685 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment #4

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-07-28 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 115819, which changed state. Bug 115819 Summary: RISC-V: Failed to hoist vrsub.vx to the header of the loop https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819 What|Removed |Added --

[Bug tree-optimization/115819] RISC-V: Failed to hoist vrsub.vx to the header of the loop

2024-07-28 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819 JuzheZhong changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/115759] RISC-V: complex code generated for lmbench's fwr when uses scalable autovec

2024-07-09 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115759 --- Comment #1 from JuzheZhong --- Do you mean you want to see the codegen look like LLVM: https://godbolt.org/z/b7W88WTGo ? I personally think GCC has better codegen than LLVM for your case in general since LLVM is using strided store wheras

[Bug target/115795] RISC-V: vsetvl step causes wrong codegen after fusing info

2024-07-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115795 --- Comment #7 from JuzheZhong --- (In reply to Jordi Sala from comment #6) > Perfect, that's what I was looking for. I'm thinking of adding a way to tell > GCC to minimize, maximize or preserve SEW on vsetvl expand. Like > -mrvv-vsetvl-sew={max

[Bug target/115795] RISC-V: vsetvl step causes wrong codegen after fusing info

2024-07-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115795 --- Comment #5 from JuzheZhong --- (In reply to Jordi Sala from comment #4) > problem is this is not related to the vectorizer as far as I'm aware, so > setting -mrvv-max-lmul=m8 does not change the fact that vsetvl pass is going > to change the

[Bug tree-optimization/115819] RISC-V: Failed to hoist vrsub.vx to the header of the loop

2024-07-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819 --- Comment #4 from JuzheZhong --- (In reply to Andrew Pinski from comment #1) > This might be a cost issue. No. I don't it's cost issue. It's because we suppress the hoist by incorrect POLY INT handling codes. I have a patch to fix it: https

[Bug c/115819] New: RISC-V: Failed to hoist vrsub.vx to the header of the loop

2024-07-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819 Bug ID: 115819 Summary: RISC-V: Failed to hoist vrsub.vx to the header of the loop Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Pr

[Bug rtl-optimization/115104] [15 Regression] RISC-V: GCC-14 can combine vsext+vadd -> vwadd but Trunk GCC (GCC 15) Failed

2024-07-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115104 JuzheZhong changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/115795] RISC-V: vsetvl step causes wrong codegen after fusing info

2024-07-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115795 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment #1

[Bug target/115725] RISC-V: Use wrong AVL for rv64gcv_zfh_zvl512b

2024-07-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725 --- Comment #10 from JuzheZhong --- (In reply to Robin Dapp from comment #9) > We already merge with operand[0], just the TU is missing as far as I can > tell. > > I'm seeing the following output with my patch: > > vsetivlizero,8

[Bug target/115725] RISC-V: Use wrong AVL for rv64gcv_zfh_zvl512b

2024-07-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725 --- Comment #8 from JuzheZhong --- I think we should include operands[0] as the "merge/maskoff" operand which we need to depend on and use TU for vec_set pattern Take ARM for example: (define_expand "vec_set" [(match_operand:VALL_F16 0 "regi

[Bug target/115725] RISC-V: Use wrong AVL for rv64gcv_zfh_zvl512b

2024-07-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725 --- Comment #1 from JuzheZhong --- It seems that we should use TU instead of TA? Robin ?

[Bug middle-end/113474] RISC-V: Fail to use vmerge.vim for constant vector

2024-05-17 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113474 JuzheZhong changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/115093] RISC-V Vector ICE in extract_insn: unrecognizable insn

2024-05-15 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115093 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment #1

[Bug c/115104] RISC-V: GCC-14 can combine vsext+vadd -> vwadd but Trunk GCC (GCC 15) Failed

2024-05-15 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115104 --- Comment #1 from JuzheZhong --- I wonder whether RIVOS CI already found which commit cause this regression ?

[Bug c/115104] New: RISC-V: GCC-14 can combine vsext+vadd -> vwadd but Trunk GCC (GCC 15) Failed

2024-05-15 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115104 Bug ID: 115104 Summary: RISC-V: GCC-14 can combine vsext+vadd -> vwadd but Trunk GCC (GCC 15) Failed Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: n

[Bug c/115068] New: RISC-V: Illegal instruction of vfwadd

2024-05-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115068 Bug ID: 115068 Summary: RISC-V: Illegal instruction of vfwadd Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c

[Bug target/114988] RISC-V: ICE in intrinsic __riscv_vfwsub_wf_f32mf2

2024-05-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114988 --- Comment #2 from JuzheZhong --- Li Pan is going to work on it. Hi, kito and Jeff. Can this fix backport to GCC-14 ?

[Bug c/114988] RISC-V: ICE in intrinsic __riscv_vfwsub_wf_f32mf2

2024-05-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114988 --- Comment #1 from JuzheZhong --- Ideally, it should be reported as (-march=rv64gc): https://godbolt.org/z/3P76YEb9s : In function 'test_vfwsub_wf_f32mf2': :4:15: error: return type 'vfloat32mf2_t' requires the V ISA extension 4 | vfloat

[Bug c/114988] New: RISC-V: ICE in intrinsic __riscv_vfwsub_wf_f32mf2

2024-05-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114988 Bug ID: 114988 Summary: RISC-V: ICE in intrinsic __riscv_vfwsub_wf_f32mf2 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component

[Bug target/114887] RISC-V: expect M8 but M4 generated with dynamic LMUL for TSVC s319

2024-04-29 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114887 --- Comment #2 from JuzheZhong --- I think there is a too conservative analysis here: note: _1: type = float, start = 1, end = 6 note: _5: type = float, start = 6, end = 8 note: _3: type = float, start = 3, end = 7 note: _4: type = floa

[Bug target/114887] RISC-V: expect M8 but M4 generated with dynamic LMUL for TSVC s319

2024-04-29 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114887 --- Comment #1 from JuzheZhong --- The "vect" cost model analysis: https://godbolt.org/z/qbqzon8x1 note: Maximum lmul = 8, At most 40 number of live V_REG at program point 6 for bb 3 It seems that we count one more variable in program point

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-27 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639 --- Comment #18 from JuzheZhong --- (In reply to Li Pan from comment #17) > According to the V abi, looks like the asm code tries to save/restore the > callee-saved registers when there is a call in function body. > > | Name| ABI Mnemonic |

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639 --- Comment #16 from JuzheZhong --- This issue is not fully fixed since the fixed patch only fixes ICE but there is a regression in codegen: https://godbolt.org/z/4nvxeqb6K Terrible codege: test(__rvv_uint64m4_t): addisp,sp,-16

[Bug target/114809] [RISC-V RVV] Counting elements might be simpler

2024-04-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114809 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment #3

[Bug target/114714] [RISC-V][RVV] ICE: insn does not satisfy its constraints (postreload)

2024-04-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment #6

[Bug tree-optimization/114749] [13 Regression] RISC-V rv64gcv ICE: in vectorizable_load, at tree-vect-stmts.cc

2024-04-17 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114749 --- Comment #4 from JuzheZhong --- Hi, Patrick. It seems that Richard didn't append the testcase in the patch. Could you send a patch to add the testcase for RISC-V port ? Thangks.

[Bug rtl-optimization/114729] RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns

2024-04-15 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment #5

[Bug target/114686] Feature request: Dynamic LMUL should be the default for the RISC-V Vector extension

2024-04-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114686 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment #2

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639 --- Comment #6 from JuzheZhong --- Definitely it is a regression: https://compiler-explorer.com/z/e68x5sT9h GCC 13.2 is ok, but GCC 14 ICE. I think you should bisect first.

[Bug tree-optimization/114476] [13/14 Regression] wrong code with -fwrapv -O3 -fno-vect-cost-model (and -march=armv9-a+sve2 on aarch64 and -march=rv64gcv on riscv)

2024-04-02 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114476 --- Comment #7 from JuzheZhong --- Hi, Robin. Will you fix this bug ?

[Bug target/114506] RISC-V: expect M8 but M4 generated with dynamic LMUL

2024-03-28 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114506 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment #4

[Bug tree-optimization/114396] [13/14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv since r13-7988-g82919cf4cb2321

2024-03-21 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #19 from JuzheZhong --- I think it's better to add pr114396.c into vect testsuite instead of x86 target test since it's the bug not only happens on x86.

[Bug tree-optimization/113281] [11/12/13 Regression] Latent wrong code due to vectorization of shift reduction and missing promotions since r9-1590

2024-03-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113281 --- Comment #28 from JuzheZhong --- The original cost model I did work for all cases but with some middle-end changes the cost model failed. I don't have time to figure out what's going on here. Robin may be interested at it.

[Bug middle-end/114109] x264 satd vectorization vs LLVM

2024-02-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109 --- Comment #3 from JuzheZhong --- (In reply to Robin Dapp from comment #2) > It is vectorized with a higher zvl, e.g. zvl512b, refer > https://godbolt.org/z/vbfjYn5Kd. OK. I see. But Clang generates many slide instruction which are expensive i

[Bug middle-end/114109] x264 satd vectorization vs LLVM

2024-02-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109 --- Comment #1 from JuzheZhong --- It seems RISC-V Clang didn't vectorize it ? https://godbolt.org/z/G4han6vM3

[Bug target/113913] [14] RISC-V: suboptimal code gen for intrinsic vcreate

2024-02-16 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113913 --- Comment #2 from JuzheZhong --- It's the known issue we are trying to fix it in GCC-15. My colleague Lehua is taking care of it. CCing Lehua.

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #16 from JuzheZhong --- The FMA is generated in widening_mul PASS: Before widening_mul (fab1): _5 = 3.33314829616256247390992939472198486328125e-1 - _4; _6 = _5 * 1.22998223643160599749535322189331054687

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #15 from JuzheZhong --- (In reply to rguent...@suse.de from comment #14) > On Wed, 7 Feb 2024, juzhe.zhong at rivai dot ai wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 > > > > --- Comment #13 from JuzheZhong --

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-06 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #13 from JuzheZhong --- Ok. I found the optimized tree: _5 = 3.33314829616256247390992939472198486328125e-1 - _4; _8 = .FMA (_5, 1.229982236431605997495353221893310546875e-1, _4); Let CST0 = 3.3

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-06 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #12 from JuzheZhong --- Ok. I found it even without vectorization: GCC is worse than Clang: https://godbolt.org/z/addr54Gc6 GCC (14 instructions inside the loop): fld fa3,0(a0) fld fa5,8(a0) fld

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-04 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #11 from JuzheZhong --- Hi, I think this RVV compiler codegen is that optimal codegen we want for RVV: https://repo.hca.bsc.es/epic/z/P6QXCc .LBB0_5:# %vector.body sub a4, t0, a3

[Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects

2024-02-02 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134 --- Comment #22 from JuzheZhong --- I have done this following experiment. diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc index bf017137260..8c36cc63d3b 100644 --- a/gcc/tree-ssa-loop-ivcanon.cc +++ b/gcc/tree-ssa-loo

[Bug target/113608] RISC-V: Vector spills after enabling vector abi

2024-02-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113608 --- Comment #2 from JuzheZhong --- vuint16m2_t vadd(vuint16m2_t a, vuint8m1_t b) { int vl = __riscv_vsetvlmax_e8m1(); vuint16m2_t c = __riscv_vzext_vf2_u16m2(b, vl); return __riscv_vadd_vv_u16m2(a, c, vl); }

[Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects

2024-02-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134 --- Comment #21 from JuzheZhong --- Hi, Richard. I looked into ivcanon. I found that: /* If the loop has more than one exit, try checking all of them for # of iterations determinable through scev. */ if (!exit) ni

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492 --- Comment #11 from JuzheZhong --- Hi, Tamar. We are interested in supporting saturating and rounding. We may need to support scalar first. Do you have any suggestions ? Or you are already working on it? Thanks.

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492 --- Comment #10 from JuzheZhong --- Hi, Tamar. We are interested in supporting saturating and rounding. We may need to support scalar first. Do you have any suggestions ? Or you are already working on it? Thanks.

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492 --- Comment #9 from JuzheZhong --- Ok. After investigation of LLVM: Before loop vectorizer: %cond12 = tail call i32 @llvm.usub.sat.i32(i32 %conv5, i32 %wsize) %conv13 = trunc i32 %cond12 to i16 After loop vectorizer: %10 = call <16 x i3

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492 --- Comment #8 from JuzheZhong --- Missing saturate vectorization causes RVV Clang 20% performance better than RVV GCC during recent benchmark evaluation. In coremark pro zip-test, I believe other targets should be the same. I wonder how we sho

[Bug c/113695] RISC-V: Sources with different EEW must use different registers

2024-01-31 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113695 --- Comment #1 from JuzheZhong --- Since both operand are input operand, early clobber "&" constraint can not help.

[Bug c/113695] New: RISC-V: Sources with different EEW must use different registers

2024-01-31 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113695 Bug ID: 113695 Summary: RISC-V: Sources with different EEW must use different registers Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects

2024-01-31 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134 --- Comment #19 from JuzheZhong --- The loop is: bb 3 -> bb 4 -> bb 5 | |__⬆ |__⬆ The condition in bb 3 is if (i_21 == 1001). The condition in bb 4 is if (N_13(D) > i_18). Look into lsplit: This loop doesn't satisfy

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-31 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #18 from JuzheZhong --- (In reply to rguent...@suse.de from comment #17) > On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 > > > > --- Comment #16 from JuzheZhong ---

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-31 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #16 from JuzheZhong --- (In reply to rguent...@suse.de from comment #15) > On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 > > > > --- Comment #14 from JuzheZhong ---

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-31 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #14 from JuzheZhong --- Thanks Richard. It seems that we can't fix this issue for now. Is that right ? If I understand correctly, do you mean we should wait after SLP representations are finished and then revisit this PR?

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #12 from JuzheZhong --- OK. It seems it has data dependency issue: missed: not vectorized, possible dependence between data-refs a[i_15] and a[_4] a[i_15] = _3; STMT 1 _4 = i_15 + 2; _5 = a[_4];STMT 2 STMT2 should not depend

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #11 from JuzheZhong --- It seems that we should fix this case (Richard gave) first which I think it's not the SCEV or value-numbering issue: double a[1024]; void foo () { for (int i = 0; i < 1022; i += 2) { double tem = a

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #10 from JuzheZhong --- I think the root cause is we think i_16 and _1 are alias due to scalar evolution: (get_scalar_evolution (scalar = i_16) (scalar_evolution = {0, +, 2}_1)) (get_scalar_evolution (scalar = _1) (scalar

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 JuzheZhong changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #20 from JuzheZhong --- (In reply to Robin Dapp from comment #19) > What seems odd to me is that in fre5 we simplify > > _429 = .COND_SHL (mask_patt_205.47_276, vect_cst__262, vect_cst__262, { 0, > ... }); > vect_prephitmp_129.5

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #8 from JuzheZhong --- Hi, Richard. Now, I find the time to GCC vectorization optimization. I find this case: _2 = a[_1]; ... a[i_16] = _4; ,,, _7 = a[_1];---> This load should be eliminated and re-use _2. Am I right

[Bug middle-end/113166] RISC-V: Redundant move instructions in RVV intrinsic codes

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113166 --- Comment #3 from JuzheZhong --- #include #include template inline vuint8m1_t tail_load(void const* data); template<> inline vuint8m1_t tail_load(void const* data) { uint64_t const* ptr64 = reinterpret_cast(data); #if 1 const vuin

[Bug c/113666] New: RISC-V: Cost model test regression due to recent middle-end loop vectorizer changes

2024-01-29 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113666 Bug ID: 113666 Summary: RISC-V: Cost model test regression due to recent middle-end loop vectorizer changes Product: gcc Version: 14.0 Status: UNCONFIRMED Seve

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-28 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #15 from JuzheZhong --- Hi, Robin. I tried to disable vec_extract, then the case passed. diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 3b32369f68c..b61b886ef3d 100644 --- a/gcc/config/riscv/autovec.md

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #13 from JuzheZhong --- Ok. I found a regression between rvv-next and trunk. I believe it is GCC-12 vs GCC-14: rvv-next: ... .L11: li t1,31 mv a2,a1 bleua7,t1,.L12 bne a6,zero,.L13

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #11 from JuzheZhong --- (In reply to Robin Dapp from comment #10) > The compile farm machine I'm using doesn't have SVE. > Compiling with -march=armv8-a -O3 pr113607.c -fno-vect-cost-model and > running it returns 0 (i.e. ok). > > p

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #9 from JuzheZhong --- Hi, Robin. Could you try this case on latest ARM SVE ? with -march=armv8-a+sve -O3 -fno-vect-cost-model. I want to make sure first it is not an middle-end bug. The RVV vectorized IR is same as ARM SVE. Tha

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #8 from JuzheZhong --- Ok. I can reproduce it too. I am gonna work on fixing it. Thanks.

  1   2   3   4   5   6   >