On Wed, Aug 7, 2019 at 2:20 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Wed, Aug 7, 2019 at 1:51 PM Richard Biener <rguent...@suse.de> wrote: > > > > On Wed, 7 Aug 2019, Richard Biener wrote: > > > > > On Mon, 5 Aug 2019, Uros Bizjak wrote: > > > > > > > On Mon, Aug 5, 2019 at 3:29 PM Richard Biener <rguent...@suse.de> wrote: > > > > > > > > > > > > > > > (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] > > > > > > > > > > > [DI "TARGET_AVX512F"]) > > > > > > > > > > > > > > > > > > > > > > and then we need to split DImode for 32bits, too. > > > > > > > > > > > > > > > > > > > > For now, please add "TARGET_64BIT && TARGET_AVX512F" for > > > > > > > > > > DImode > > > > > > > > > > condition, I'll provide _doubleword splitter later. > > > > > > > > > > > > > > > > > > Shouldn't that be TARGET_AVX512VL instead? Or does the insn > > > > > > > > > use %g0 etc. > > > > > > > > > to force use of %zmmN? > > > > > > > > > > > > > > > > It generates V4SI mode, so - yes, AVX512VL. > > > > > > > > > > > > > > case SMAX: > > > > > > > case SMIN: > > > > > > > case UMAX: > > > > > > > case UMIN: > > > > > > > if ((mode == DImode && (!TARGET_64BIT || !TARGET_AVX512VL)) > > > > > > > || (mode == SImode && !TARGET_SSE4_1)) > > > > > > > return false; > > > > > > > > > > > > > > so there's no way to use AVX512VL for 32bit? > > > > > > > > > > > > There is a way, but on 32bit targets, we need to split DImode > > > > > > operation to a sequence of SImode operations for unconverted > > > > > > pattern. > > > > > > This is of course doable, but somehow more complex than simply > > > > > > emitting a DImode compare + DImode cmove, which is what current > > > > > > splitter does. So, a follow-up task. > > > > > > > > > > Ah, OK. So for the above condition we can elide the !TARGET_64BIT > > > > > check we just need to properly split if we enable the scalar minmax > > > > > pattern for DImode on 32bits, the STV conversion would go fine. > > > > > > > > Yes, that is correct. > > > > > > So I tested the patch below (now with appropriate ChangeLog) on > > > x86_64-unknown-linux-gnu. I've thrown it at SPEC CPU 2006 with > > > the obvious hmmer improvement, now checking for off-noise results > > > with a 3-run on those that may have one (with more than +-1 second > > > differences in the 1-run). > > > > > > As-is the patch likely runs into the splitting issue for DImode > > > on i?86 and the patch misses functional testcases. I'll do the > > > hmmer loop with both DImode and SImode and testcases to trigger > > > all pattern variants with the different ISAs we have. > > > > > > Some of the patch could be split out (the cost changes that are > > > also effective for DImode for example). > > > > > > AFAICS we could go with only adding SImode avoiding the DImode > > > splitting thing and this would solve the hmmer regression. > > > > I've additionally bootstrapped with --with-arch=nehalem which > > reveals > > > > FAIL: gcc.target/i386/minmax-2.c scan-assembler test > > FAIL: gcc.target/i386/minmax-2.c scan-assembler-not cmp > > > > we emit cmp + cmov here now with -msse4.1 (as soon as the max > > pattern is enabled I guess) > > Actually, we have to split using ix86_expand_int_compare. This will > generate optimized CC mode.
So, this only matters for comparisons against zero. Currently, the insn_and_split pattern allows only registers, but we can add other types, too. I'd say that this is benign issue. Uros.