> we are going to have some AMD CPU with AVX2 support soon, the question is > if it will prefer 256-bit vmovups/vmovupd/vmovdqu or split, but even > if it will prefer split, the question is if like bdver{1,2,3} it will > be X86_TUNE_AVX128_OPTIMAL, because if yes, then how 256-bit unaligned > loads/stores are handled is much less important there. Ganesh?
256-bit is friendly on bdver4. But, 256 bit unaligned stores are micro-coded which we would like to avoid. So we require 128-bit MOVUPS. -----Original Message----- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Tuesday, November 12, 2013 3:57 PM To: Jan Hubicka Cc: H.J. Lu; Vladimir Makarov; GCC Patches; Uros Bizjak; Richard Henderson; Gopalasubramanian, Ganesh Subject: Re: Honnor ix86_accumulate_outgoing_args again On Tue, Nov 12, 2013 at 11:05:45AM +0100, Jan Hubicka wrote: > > @@ -16576,7 +16576,7 @@ ix86_avx256_split_vector_move_misalign (rtx > > op0, rtx op1) > > > > if (MEM_P (op1)) > > { > > - if (TARGET_AVX256_SPLIT_UNALIGNED_LOAD) > > + if (!TARGET_AVX2 && TARGET_AVX256_SPLIT_UNALIGNED_LOAD) > > { > > rtx r = gen_reg_rtx (mode); > > m = adjust_address (op1, mode, 0); @@ -16596,7 +16596,7 @@ > > ix86_avx256_split_vector_move_misalign (rtx op0, rtx op1) > > } > > else if (MEM_P (op0)) > > { > > - if (TARGET_AVX256_SPLIT_UNALIGNED_STORE) > > + if (!TARGET_AVX2 && TARGET_AVX256_SPLIT_UNALIGNED_STORE) > > I would add explanation comment on those two. Looking at http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01235.html we are going to have some AMD CPU with AVX2 support soon, the question is if it will prefer 256-bit vmovups/vmovupd/vmovdqu or split, but even if it will prefer split, the question is if like bdver{1,2,3} it will be X86_TUNE_AVX128_OPTIMAL, because if yes, then how 256-bit unaligned loads/stores are handled is much less important there. Ganesh? > Shall we also disable argument accumulation for cores? It seems we won't > solve the IRA issues, right? You mean LRA issues here, right? If you are starting to use no-accumulate-outgoing-args much more often than in the past, I think the problem that LRA forces a frame pointer in that case is much more important now (or has that been fixed in the mean time?). Vlad? Jakub