On Mon, Feb 21, 2022 at 6:43 PM Hongtao Liu <crazy...@gmail.com> wrote: > > On Tue, Feb 22, 2022 at 2:35 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > On Sun, Feb 20, 2022 at 6:01 PM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > On Thu, Feb 17, 2022 at 9:56 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > On Thu, Feb 17, 2022 at 08:51:31AM +0100, Uros Bizjak wrote: > > > > > On Thu, Feb 17, 2022 at 6:25 AM Hongtao Liu via Gcc-patches > > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > > > On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches > > > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > > > > > Reading YMM registers with all zero bits needs VZEROUPPER on > > > > > > > Sandy Bride, > > > > > > > Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX > > > > > > > transition penalty. Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER > > > > > > > to > > > > > > > generate vzeroupper instruction after loading all-zero YMM/YMM > > > > > > > registers > > > > > > > and enable it by default. > > > > > > Shouldn't TARGET_READ_ZERO_YMM_ZMM_NONEED_VZEROUPPER sounds a bit > > > > > > smoother? > > > > > > Because originally we needed to add vzeroupper to all avx<->sse > > > > > > cases, > > > > > > now it's a tune to indicate that we don't need to add it in some > > > > > > > > > > Perhaps we should go from the other side and use > > > > > X86_TUNE_OPTIMIZE_AVX_READ for new processors? > > > > > > > > > > > > > Here is the v2 patch to add TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO. > > > > > > > The patch LGTM in general, but please rebase against > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590541.html > > > and resend the patch, also wait a couple days in case Uros(and others) > > > have any comments. > > > > I am dropping my patch since it causes the compile-time regression. > I think only vextractif128 part is reverted, but we still have > vmovdqu(below) which should also cause penalty?
commit fe79d652c96b53384ddfa43e312cb0010251391b Author: Richard Biener <rguent...@suse.de> Date: Thu Feb 17 14:40:16 2022 +0100 target/104581 - compile-time regression in mode-switching has diff --git a/gcc/testsuite/gcc.target/i386/pr101456-1.c b/gcc/testsuite/gcc.target/i386/pr101456-1.c index 803fc6e0207..7fb3a3f055c 100644 --- a/gcc/testsuite/gcc.target/i386/pr101456-1.c +++ b/gcc/testsuite/gcc.target/i386/pr101456-1.c @@ -30,4 +30,5 @@ foo3 (void) bar (); } -/* { dg-final { scan-assembler-not "vzeroupper" } } */ +/* See PR104581 for the XFAIL reason. */ +/* { dg-final { scan-assembler-not "vzeroupper" { xfail *-*-* } } } */ and I checked in: commit 1931cbad498e625b1e24452dcfffe02539b12224 Author: H.J. Lu <hjl.to...@gmail.com> Date: Fri Feb 18 10:36:53 2022 -0800 pieces-memset-21.c: Expect vzeroupper for ia32 Update gcc.target/i386/pieces-memset-21.c to expect vzeroupper for ia32 caused by commit fe79d652c96b53384ddfa43e312cb0010251391b Author: Richard Biener <rguent...@suse.de> Date: Thu Feb 17 14:40:16 2022 +0100 target/104581 - compile-time regression in mode-switching PR target/104581 * gcc.target/i386/pieces-memset-21.c: Expect vzeroupper for ia32. I believe that vmovdqu is also covered. -- H.J.