On Wed, 7 Aug 2019, Richard Biener wrote: > On Mon, 5 Aug 2019, Uros Bizjak wrote: > > > On Mon, Aug 5, 2019 at 3:29 PM Richard Biener <rguent...@suse.de> wrote: > > > > > > > > > > > (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] [DI > > > > > > > > > "TARGET_AVX512F"]) > > > > > > > > > > > > > > > > > > and then we need to split DImode for 32bits, too. > > > > > > > > > > > > > > > > For now, please add "TARGET_64BIT && TARGET_AVX512F" for DImode > > > > > > > > condition, I'll provide _doubleword splitter later. > > > > > > > > > > > > > > Shouldn't that be TARGET_AVX512VL instead? Or does the insn use > > > > > > > %g0 etc. > > > > > > > to force use of %zmmN? > > > > > > > > > > > > It generates V4SI mode, so - yes, AVX512VL. > > > > > > > > > > case SMAX: > > > > > case SMIN: > > > > > case UMAX: > > > > > case UMIN: > > > > > if ((mode == DImode && (!TARGET_64BIT || !TARGET_AVX512VL)) > > > > > || (mode == SImode && !TARGET_SSE4_1)) > > > > > return false; > > > > > > > > > > so there's no way to use AVX512VL for 32bit? > > > > > > > > There is a way, but on 32bit targets, we need to split DImode > > > > operation to a sequence of SImode operations for unconverted pattern. > > > > This is of course doable, but somehow more complex than simply > > > > emitting a DImode compare + DImode cmove, which is what current > > > > splitter does. So, a follow-up task. > > > > > > Ah, OK. So for the above condition we can elide the !TARGET_64BIT > > > check we just need to properly split if we enable the scalar minmax > > > pattern for DImode on 32bits, the STV conversion would go fine. > > > > Yes, that is correct. > > So I tested the patch below (now with appropriate ChangeLog) on > x86_64-unknown-linux-gnu. I've thrown it at SPEC CPU 2006 with > the obvious hmmer improvement, now checking for off-noise results > with a 3-run on those that may have one (with more than +-1 second > differences in the 1-run).
Update on this one. On Haswell I see (besides hmmer and the ones +-1 second in the 1-run); base is unpatched, peak is patched: 401.bzip2 9650 382 25.3 S 9650 380 25.4 S 401.bzip2 9650 381 25.3 * 9650 377 25.6 * 401.bzip2 9650 381 25.3 S 9650 376 25.7 S 458.sjeng 12100 433 28.0 S 12100 433 28.0 S 458.sjeng 12100 428 28.3 S 12100 424 28.5 * 458.sjeng 12100 432 28.0 * 12100 424 28.6 S 464.h264ref 22130 413 53.6 S 22130 422 52.5 S 464.h264ref 22130 413 53.6 * 22130 421 52.5 S 464.h264ref 22130 413 53.6 S 22130 421 52.5 * 473.astar 7020 328 21.4 S 7020 316 22.2 S 473.astar 7020 322 21.8 S 7020 314 22.4 * 473.astar 7020 322 21.8 * 7020 311 22.6 S 416.gamess 19580 593 33.0 S 19580 601 32.6 S 416.gamess 19580 593 33.0 S 19580 601 32.6 * 416.gamess 19580 593 33.0 * 19580 601 32.6 S so it's a loss for 464.h264ref and 416.gamess from the above numbers and a slight win for the others (and a big one for 456.hmmer). I plan to have a look at the two as followup only, possibly adding a debug couter to be able to bisect to a specific chain. Richard.