On Thu, Dec 20, 2018 at 08:42:05AM +0100, Uros Bizjak wrote: > > If one vcond argument is all ones (non-bool) vector and another one is all > > zeros, we can use for AVX512{DQ,BW} (sometimes + VL) the vpmovm2? insns. > > While if op_true is all ones and op_false, we emit large code that the > > combiner often optimizes to that vpmovm2?, if the arguments are swapped, > > we emit vpxor + vpternlog + and masked move (blend), while we could just > > invert the mask with knot* and use vpmovm2?. > > > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > > trunk? The patch is large, but it is mostly reindentation, in the > > attachment there is diff -ubpd variant of the i386.c changes to make it more > > readable. > > > > 2018-12-19 Jakub Jelinek <ja...@redhat.com> > > > > PR target/88547 > > * config/i386/i386.c (ix86_expand_sse_movcc): For maskcmp, try to > > emit vpmovm2? instruction perhaps after knot?. Reorganize code > > so that it doesn't have to test !maskcmp in almost every > > conditional. > > > > * gcc.target/i386/pr88547-1.c: New test. > > LGTM, under assumption that interunit moves from mask reg to xmm regs are > fast.
In a simple benchmark (calling these functions in a tight loop on i9-7960X) the performance is the same, just shorter sequences. Jakub