On Fri, Nov 25, 2016 at 10:51 AM, Uros Bizjak <ubiz...@gmail.com> wrote:
> Attached patch tries to fix the mess with operations where register > allocator is free to use either mask or general registers. There are > plenty of problems with this approach: > a) missing operations wth general registers > - kxnor operation with general register does not exist > - kandn operation with general registers exists for TARGET_BMI only > > b) register allocation problems > - DImode operations with 64bit general registers do not exist on 32bit target > - register allocator is free to allocate operation with either > register set, resulting in costly moves between general and mask > register sets > > Mask operations can be generated in a consistent way using > corresponding built-ins. To prevent RA problems, we have to separate > mask ops from general ops - this way we always select operation with > well defined register set. > > There is special problem with 64bit mask ops on 32bit targets: > defining these operations as a named pattern, where only mask > registers are available, will result in the situation where mask > operation will be used for 64bit values living in a pair of general > registers. RA will obviously generate many inter-regset moves in this > situation. > > Another problem in point a) above is dependence of general-reg > operations on various TARGET_AVX* flags. This can be seen with kxnor > pattern, which has to be artificially enabled during register > allocation just to split non-masked operation back to sequence of > general operations. Similar situation arises with kandn on non-BMI > targets. > > IMO the mentioned interferences do not warrant mixing of mask and > general operations. I suspect here will be no end of "the value is > moved between integer and mask registers unnecessary" type of bugs > (as was the situation on 32bit targets in the past, where DImode > values were moved through MMX registers). Users can and should use > relevant builtins when operating with mask registers (n.b.: generic > logic operations can still be used; RA will move values between > regsets - but in a *consistent way*). FWIW, the few folding > opportunities with mask builtins can be implemented in a > target-folding function. > > The attached patch also paves the way for implementation of new > builtins in patch [1] that are otherwise nearly to impossible to > implement. > > 2016-11-25 Uros Bizjak <ubiz...@gmail.com> > > * config/i386/i386.md (UNSPEC_KMASKOP): New. > (UNSPEC_KMOV): Remove. > (kmovw): Expand to plain HImode move. > (k<any_logic:code><mode>): Rename from *k<logic><mode>. Use > register_operand predicates. Tag pattern with UNSPEC_KMASKOP. > Remove corresponding clobber-removing splitter. > (*anddi_1): Remove mask register alternatives. > (*andsi_1): Ditto. > (*andhi_1): Ditto. > (*andqi_1): Ditto. > (*<any_or:code><mode>_1): Ditto. > (*<any_or:code>qi_1): Ditto. > (kandn<mode>): Use SWI1248_AVX512BW mode iterator. Remove > general register alternatives. Tag pattern with UNSPEC_KMASKOP. > Remove corresponding splitter to operation with general registers. > (*andn<SWI38:mode>): Rename from *bmi_andn_<mode>. > (*andn<SWI12:mode>): New pattern. > (*kxnor<mode>): Remove general register alternatives. Tag pattern > with UNSPEC_KMASKOP. Remove corresponding splitter to operation > with general registers. > (knot<mode>): New insn pattern. > (*one_cmpl<mode>2_1): Remove mask register alternatives. > (one_cmplqi2_1): Ditto. > (*k<any_lshift:code><mode>): Rename from *k<mshift><mode>3. > Tag pattern with UNSPEC_KMASKOP. Add mode attribute. > * config/i386/predicates.md (mask_reg_operand): Remove predicate. > * config/i386/sse.md (vec_unpacks_hi_hi): Update pattern > to generate kmaskop shift. > (vec_unpacks_hi_<mode>): Ditto. > * config/i386/i386-builtin.def (__builtin_ia32_kandhi): > Use CODE_FOR_kandhi. > (__builtin_ia32_knothi): Use CODE_FOR_knothi. > (__builtin_ia32_korhi): Use CODE_FOR_kiorhi. > (__builtin_ia32_kxorhi): Use CODE_FOR_kxorhi. > > Patch was bootstrapped and regression tested on x86_64-linux-gnu > {,-m32}. There are a couple fo trivial scan-asm errors due to a rename > and a missing kmov insn, where RA choose much more optimal movw > <imm>,<mem> insn instead. Now committed to mainline with additional testsuite patch: 2016-11-28 Uros Bizjak <ubiz...@gmail.com> * gcc.target/i386/bmi-andn-1a.c (dg-final): Update scan string. * gcc.target/i386/bmi-andn-2a.c (dg-final): Ditto. I'll fill gcc.target/i386/avx512f-kmovw-1.c failure in a follow-up patch. Uros. Index: gcc.target/i386/bmi-andn-1a.c =================================================================== --- gcc.target/i386/bmi-andn-1a.c (revision 242892) +++ gcc.target/i386/bmi-andn-1a.c (working copy) @@ -3,4 +3,4 @@ #include "bmi-andn-1.c" -/* { dg-final { scan-assembler-times "bmi_andn_di" 1 } } */ +/* { dg-final { scan-assembler-times "andndi" 1 } } */ Index: gcc.target/i386/bmi-andn-2a.c =================================================================== --- gcc.target/i386/bmi-andn-2a.c (revision 242892) +++ gcc.target/i386/bmi-andn-2a.c (working copy) @@ -3,4 +3,4 @@ #include "bmi-andn-2.c" -/* { dg-final { scan-assembler-times "bmi_andn_si" 1 } } */ +/* { dg-final { scan-assembler-times "andnsi" 1 } } */