https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120067
--- Comment #2 from rdapp.gcc at gmail dot com ---
> This is reduced from 525.x264_r's 4th hottest block:
> https://godbolt.org/z/KdWv1er6f
>
> AArch64 assembly is clean and efficient (35 insns) while RISC-V's is long and
> messy (114 insns).
>
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119373
--- Comment #10 from rdapp.gcc at gmail dot com ---
> Since the primary underlying scalar mode in the loop is DF, the autodetected
> vector mode returned by preferred_simd_mode is RVVM1DF. In comparison, AArch64
> picks VNx2DF, which allows the v
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340
--- Comment #6 from rdapp.gcc at gmail dot com ---
>> Another thought I had as we already know that SLP handles this more
>> gracefully:
>> Would it make sense to "just" defer to BB vectorization and have loop
>> vectorization not do anything, p
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340
--- Comment #4 from rdapp.gcc at gmail dot com ---
> That said - if DR analysis could, say, "force" a particular VF where it
> knows that gaps are closed we might "virtually" unroll this and thus
> detect it as a group of contiguous 16 stores. N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057
--- Comment #4 from rdapp.gcc at gmail dot com ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057
>
> --- Comment #3 from JuzheZhong ---
> (In reply to Robin Dapp from comment #2)
>> I think depending on the performance of strided loads/s