https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #14 from Vineet Gupta <vineetg at gcc dot gnu.org> --- (In reply to Li Pan from comment #7) > Created attachment 59661 [details] > with usad pattern Can you please post the patch, lest we duplicate your effort. It would be nice to test it on real hardware. @Robin, it seems the current codegen generates 2 widening ops, which might not be as efficient. We have done some profiling of widening add throughput and Edwin's data tells me that the throughput might not be the same.