https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #15 from Robin Dapp <rdapp at gcc dot gnu.org> --- (In reply to Vineet Gupta from comment #14) > (In reply to Li Pan from comment #7) > > Created attachment 59661 [details] > > with usad pattern > > Can you please post the patch, lest we duplicate your effort. > It would be nice to test it on real hardware. > > @Robin, it seems the current codegen generates 2 widening ops, which might > not be as efficient. We have done some profiling of widening add throughput > and Edwin's data tells me that the throughput might not be the same. Hmm, would you ever want the widening ops if the throughput is worse then? I.e. if you had a throughput of 2 for simple adds and zexts but 1 for vwadd could you not disable them altogether if they "clog" the pipeline?