https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92651
--- Comment #11 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to rguent...@suse.de from comment #8) > Sure. Another option would be to enhance STV even further > (or add some peephole patterns - combine runs before STV2) to > transform the > > psubd xmm3, xmm0 > psubd xmm0, xmm1 > pmaxsd xmm0, xmm3 > > into > > psubd %xmm3, %xmm0 > pabsd %xmm0, %xmm0 > > for enhancing STV that means adding abs() patterns (or adding > combine-like matching to the pass which I'd suggest not do). > > Clearly that the above conversion isn't done is a generic > missed optimization. Maybe you can benchmark that as well > though I guess it won't come near the xor variant? After PR97873 introduced abs to STV: int test (int x, int y) { return abs (x - y); } cc1 -O2 -msse4.1 -mstv -m32: movd 4(%esp), %xmm0 movd 8(%esp), %xmm1 psubd %xmm1, %xmm0 pabsd %xmm0, %xmm0 movd %xmm0, %eax