[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2025-01-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #19 from GCC Commits --- The master branch has been updated by Vineet Gupta : https://gcc.gnu.org/g:b755c151fde4ad736405bb2e13a7de0420161179 commit r15-6673-gb755c151fde4ad736405bb2e13a7de0420161179 Author: Vineet Gupta Date: Tu

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-12-11 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #18 from Robin Dapp --- > But the point here really here is we don't need the widening semantics, more > twice. The min+max+sub in loops with a final reducing sum should do the > trick. OK I guess it can be argued that minus (max

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-12-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #17 from Li Pan --- (In reply to Vineet Gupta from comment #14) > (In reply to Li Pan from comment #7) > > Created attachment 59661 [details] > > with usad pattern > > Can you please post the patch, lest we duplicate your effort. >

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-12-10 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #16 from Vineet Gupta --- (In reply to Robin Dapp from comment #15) > (In reply to Vineet Gupta from comment #14) > > @Robin, it seems the current codegen generates 2 widening ops, which might > > not be as efficient. We have done s

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-12-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #15 from Robin Dapp --- (In reply to Vineet Gupta from comment #14) > (In reply to Li Pan from comment #7) > > Created attachment 59661 [details] > > with usad pattern > > Can you please post the patch, lest we duplicate your effort

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-12-10 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #14 from Vineet Gupta --- (In reply to Li Pan from comment #7) > Created attachment 59661 [details] > with usad pattern Can you please post the patch, lest we duplicate your effort. It would be nice to test it on real hardware. @Ro

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-11-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #13 from Robin Dapp --- I don't fully understand yet :) So the full-register moves are undesirable, I agree. When accumulating with a widening op they seem unavoidable, though. The only alternative would be to split out the extens

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-11-21 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #12 from Li Pan --- (In reply to Robin Dapp from comment #11) > (In reply to Li Pan from comment #9) > > Created attachment 59663 [details] > > before_vs_after when outer loop is 128 > > Ok, that's a different loop then. I'm seeing

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-11-21 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #11 from Robin Dapp --- (In reply to Li Pan from comment #9) > Created attachment 59663 [details] > before_vs_after when outer loop is 128 Ok, that's a different loop then. I'm seeing vmv1rs in the current version, is that what you

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-11-21 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #10

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-11-21 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #9 from Li Pan --- Created attachment 59663 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59663&action=edit before_vs_after when outer loop is 128

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-11-21 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #8 from Robin Dapp --- So the difference is (usad expansion) vmax vmin vsub vsext vadd vs (right now) vwsub vneg vmax vwadd Why is that preferable?

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-11-21 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #7 from Li Pan --- Created attachment 59661 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59661&action=edit with usad pattern

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-11-21 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #6 from Li Pan --- Created attachment 59660 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59660&action=edit upstream

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-11-21 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #5 from Robin Dapp --- If it's better then OK. Can you show an example?

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-11-21 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #4 from JuzheZhong --- (In reply to Robin Dapp from comment #3) > First, pixel_sad_4x4 is not very hot, 8x8 and 16x16 are. > > Second, we are vectorizing this, but with -mno-vector-strict-align. > > IMHO we don't need to synthesize

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

2024-11-21 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722 --- Comment #3 from Robin Dapp --- First, pixel_sad_4x4 is not very hot, 8x8 and 16x16 are. Second, we are vectorizing this, but with -mno-vector-strict-align. IMHO we don't need to synthesize an usad pattern.