On Sun, 4 Aug 2019, Uros Bizjak wrote: > On Sun, Aug 4, 2019 at 7:23 PM Jakub Jelinek <ja...@redhat.com> wrote: > > > > On Sun, Aug 04, 2019 at 07:11:01PM +0200, Uros Bizjak wrote: > > > Yes, the approach looks OK to me. It makes chain building mode > > > agnostic, and the chain building can be used for > > > a) DImode x86_32 (as is now), but maybe 64bit minmax operation can be > > > added. > > > b) SImode x86_32 and x86_64 (this will be mainly used for SImode > > > minmax and surrounding SImode operations) > > > c) DImode x86_64 (also, mainly used for DImode minmax and surrounding > > > DImode operations) > > > > > > > Still need help with the actual patterns for minmax and how the > > > > splitters > > > > should look like. > > > > > > Please look at the attached patch. Maybe we can add memory_operand as > > > operand 1 and operand 2 predicate, but let's keep things simple for > > > now. > > > > Shouldn't it be used also for p{min,max}ud rather than just p{min,max}sd? > > What about p{min,max}{s,u}{b,w,q}? Some of those are already in SSE. > > Sure, unsigned ops will also be added. I just went through the > Richard's patch and looked for RTXes that Richard's patch handles. I'm > not sure about HImode and QImode minmax operations. While these can be > added, we would need to re-run STV in HImode and QImode - I wonder if > it is worth.
I think we can always extend later, for now I'm trying to do {SI,DI}mode only, but yes, u{min,max} would be nice to not miss. > > If the conversion of the chain fails, couldn't the STV pass split those > > SImode etc. min/max patterns into code with branches, rather than turn it > > into cmovs? > > Since these patterns require SSE4.1, we are sure that we can split > back to cmov. But IMO, cmov/jcc issue is orthogonal to minmax > conversion and should be handled by some other machine-specific pass > that would > analyse cmove insertion and eventually split unwanted cmoves back to > jcc (based on some yet unknown metrics). Please note that there is no > definite proof that it is beneficial to convert cmoves to jcc for all > x86 targets. I guess a tunable plus (micro-)benchmarking could make this decision. But yes, this is largely independent - and if we split to jumps then RTL if-conversion will happily turn it back to cmoves anyway. Richard.