On Sat, Jun 4, 2022 at 1:03 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > By way of an apology for causing PR target/105791, where I'd overlooked > the need to support V1TImode in TARGET_XOP's vpcmov instruction, this > patch further improves support for TARGET_XOP's vpcmov instruction, by > recognizing it in combine. > > Currently, the test case: > > typedef int v4si __attribute__ ((vector_size (16))); > v4si foo(v4si c, v4si t, v4si f) > { > return (c&t)|(~c&f); > } > > on x86_64 with -O2 -mxop generates: > vpxor %xmm2, %xmm1, %xmm1 > vpand %xmm0, %xmm1, %xmm1 > vpxor %xmm2, %xmm1, %xmm0 > ret > > but with this patch now generates: > vpcmov %xmm0, %xmm2, %xmm1, %xmm0 > ret > > On its own, the new combine splitter works fine on TARGET_64BIT, but > alas with -m32 combine incorrectly thinks the replacement instruction > is more expensive, as IF_THEN_ELSE isn't currently/correctly handled > in ix86_rtx_costs. So to avoid the need for a target selector in the > new testcase, I've updated ix86_rtx_costs to report that AMD's vpcmov > has a latency of two cycles [it's now an obsolete instruction set > extension and there's unlikely to ever be a processor where this > instruction has a different timing], and while there I also added > rtx_costs for x86_64's integer conditional move instructions (which > have single cycle latency). > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? > > > 2022-06-04 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386.cc (ix86_rtx_costs): Add a new case for > IF_THEN_ELSE, and provide costs for TARGET_XOP's vpcmov and > TARGET_CMOVE's (scalar integer) conditional moves. > * config/i386/sse.md (define_split): Recognize XOP's vpcmov > from its equivalent (canonical) pxor;pand;pxor sequence. > > gcc/testsuite/ChangeLog > * gcc.target/i386/xop-pcmov3.c: New test case.
OK with a nit below. Thanks, Uros. +{ + operands[5] = REGNO (operands[4]) == REGNO (operands[1]) ? operands[2] + : operands[1]; +}) + Please expand this to enhance readability, it is a bit too cryptic for me ...