[Bug target/120067] RISC-V: x264 sub4x4_dct high icount

2025-05-02 Thread rdapp.gcc at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120067 --- Comment #2 from rdapp.gcc at gmail dot com --- > This is reduced from 525.x264_r's 4th hottest block: > https://godbolt.org/z/KdWv1er6f > > AArch64 assembly is clean and efficient (35 insns) while RISC-V's is long and > messy (114 insns). > >

[Bug target/119373] RISC-V: missed unrolling opportunity

2025-04-24 Thread rdapp.gcc at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119373 --- Comment #10 from rdapp.gcc at gmail dot com --- > Since the primary underlying scalar mode in the loop is DF, the autodetected > vector mode returned by preferred_simd_mode is RVVM1DF. In comparison, AArch64 > picks VNx2DF, which allows the v

[Bug tree-optimization/115340] Loop/SLP vectorization possible inefficiency

2025-01-08 Thread rdapp.gcc at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 --- Comment #6 from rdapp.gcc at gmail dot com --- >> Another thought I had as we already know that SLP handles this more >> gracefully: >> Would it make sense to "just" defer to BB vectorization and have loop >> vectorization not do anything, p

[Bug tree-optimization/115340] Loop/SLP vectorization possible inefficiency

2025-01-08 Thread rdapp.gcc at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 --- Comment #4 from rdapp.gcc at gmail dot com --- > That said - if DR analysis could, say, "force" a particular VF where it > knows that gaps are closed we might "virtually" unroll this and thus > detect it as a group of contiguous 16 stores. N

[Bug target/118057] RISC-V: Can't vectorize load and store with zvl128b

2024-12-16 Thread rdapp.gcc at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057 --- Comment #4 from rdapp.gcc at gmail dot com --- > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057 > > --- Comment #3 from JuzheZhong --- > (In reply to Robin Dapp from comment #2) >> I think depending on the performance of strided loads/s