Hi, Segher

> > > > In pr84524.c we got a loop with an extended inline asm:
> > > > asm volatile ("" : "+r" (v))
> > > > which also gives us a “surprising” situation Alexander predicts.
> > > >
> > > > For sched-deps scanner such volatile asm is a “true barrier” and we
> > > > create dependencies to almost all other instructions, while DF scanners
> > > > don’t give us such information.
> > >
> > > There is no such thing as a "true barrier" in extended asm.  The only 
> > > thing
> > > volatile asm means is that the asm has a side effect unknown to the 
> > > compiler;
> > > this can *not* be a modification of a register or of memory contents, such
> > > things are known by the compiler, that's what clobbers and "memory" 
> > > clobber
> > > are about.
> >
> > In sched-deps.c we got:
> >     case ASM_OPERANDS:
> >     case ASM_INPUT:
> >       {
> >        /* Traditional and volatile asm instructions must be considered to 
> > use
> >           and clobber all hard registers, all pseudo-registers and all of
> >           memory.  So must TRAP_IF and UNSPEC_VOLATILE operations.
> >
> >           Consider for instance a volatile asm that changes the fpu rounding
> >           mode.  An insn should not be moved across this even if it only 
> > uses
> >           pseudo-regs because it might give an incorrectly rounded result.  
> > */
> >        if ((code != ASM_OPERANDS || MEM_VOLATILE_P (x))
> >            && !DEBUG_INSN_P (insn))
> >          reg_pending_barrier = TRUE_BARRIER;
> >        ...
>
> This code was added in 1997 (r14770).  In 2004 the documentation was
> changed to clarify how things really work (r88999):
>
> "Note that even a volatile @code{asm} instruction can be moved relative to
> other code, including across jump instructions."
>
> (followed by an example exactly about what this means for FPU control).

Thanks for pointing to that changes!  Unfortunately, sched-deps.c was
more conservative this 15 years...
Let’s try to fix it.

> > > > Maybe it is a good idea somehow re-implement modulo scheduler using only
> > > > one scanner instead of two, but at the moment the best thing to do is to
> > > > detect the situation earlier and skip such loops.
> > >
> > > Or fix the broken code...
> >
> > I’m not sure what you mean here,
>
> I mean have the modulo scheduler implement the correct asm semantics, not
> some more restrictive thing that gets it into conflicts with DF, etc.
>
> I don't think this will turn out to be a problem in any way.  Some invalid
> asm will break, sure.

I have started with applying this without any SMS changes:

diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -2753,22 +2753,14 @@ sched_analyze_2 (struct deps_desc *deps, rtx
x, rtx_insn *insn)

     case UNSPEC_VOLATILE:
       flush_pending_lists (deps, insn, true, true);
+
+      if (!DEBUG_INSN_P (insn))
+       reg_pending_barrier = TRUE_BARRIER;
       /* FALLTHRU */

     case ASM_OPERANDS:
     case ASM_INPUT:
       {
-       /* Traditional and volatile asm instructions must be considered to use
-          and clobber all hard registers, all pseudo-registers and all of
-          memory.  So must TRAP_IF and UNSPEC_VOLATILE operations.
-
-          Consider for instance a volatile asm that changes the fpu rounding
-          mode.  An insn should not be moved across this even if it only uses
-          pseudo-regs because it might give an incorrectly rounded result.  */
-       if ((code != ASM_OPERANDS || MEM_VOLATILE_P (x))
-           && !DEBUG_INSN_P (insn))
-         reg_pending_barrier = TRUE_BARRIER;
-
        /* For all ASM_OPERANDS, we must traverse the vector of input operands.
           We cannot just fall through here since then we would be confused
           by the ASM_INPUT rtx inside ASM_OPERANDS, which do not indicate

Regstrapping it on x86-64 shows some failures. First is with ms-sysv
abi test and can be solved like this:

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc
b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc
--- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc
+++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc
@@ -511,7 +511,7 @@ void make_do_test (const vector<class arg> &args,
            /* End if init_test call.  */

            if (f.get_realign () && unaligned == 1)
-             out << "  __asm__ __volatile__ (\"subq $8,%%rsp\":::\"cc\");"
+             out << "  __asm__ __volatile__ (\"subq
$8,%%rsp\":::\"cc\", \"memory\");"
                  << endl;

            out << "  ret = do_test_"
@@ -525,7 +525,7 @@ void make_do_test (const vector<class arg> &args,
            out << ");" << endl;

            if (f.get_realign () && unaligned == 1)
-             out << "  __asm__ __volatile__ (\"addq $8,%%rsp\":::\"cc\");"
+             out << "  __asm__ __volatile__ (\"addq
$8,%%rsp\":::\"cc\", \"memory\");"
                  << endl;

            out << "  check_results (ret);" << endl;

Here we have some asms which control stack pointer (sigh!). It
certainly may be broken at any moment, but right now “memory” clobber
fixes everything for me.

Another failure is:

  FAIL: gcc.dg/guality/pr58791-4.c   -O2  -DPREVENT_OPTIMIZATION  line
pr58791-4.c:32 i == 486
  FAIL: gcc.dg/guality/pr58791-4.c   -O2  -DPREVENT_OPTIMIZATION  line
pr58791-4.c:32 i2 == 487
  FAIL: gcc.dg/guality/pr58791-4.c   -O3 -g  -DPREVENT_OPTIMIZATION
line pr58791-4.c:32 i == 486
  FAIL: gcc.dg/guality/pr58791-4.c   -O3 -g  -DPREVENT_OPTIMIZATION
line pr58791-4.c:32 i2 == 487
  FAIL: gcc.dg/guality/pr58791-4.c   -Os  -DPREVENT_OPTIMIZATION  line
pr58791-4.c:32 i == 486
  FAIL: gcc.dg/guality/pr58791-4.c   -Os  -DPREVENT_OPTIMIZATION  line
pr58791-4.c:32 i2 == 487
  FAIL: gcc.dg/guality/pr58791-4.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i ==
486
  FAIL: gcc.dg/guality/pr58791-4.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i2 ==
487
  FAIL: gcc.dg/guality/pr58791-4.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i ==
486
  FAIL: gcc.dg/guality/pr58791-4.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i2 ==
487

It is connected to debug-info, and I cannot solve it myself. I am not
sure how this should work when we try to print dead-code variable (i2)
while debugging -O2 (O3/Os) compiled executable.  Jakub created that
test, he is in CC already.

I will also look at aarch64 regstrap a bit later.  But that job should
also be done for all other targets.  Segher, could you test it on
rs6000?

> > > > In pr84524.c we got a loop with an extended inline asm:
> > > > asm volatile ("" : "+r" (v))

It seems a good idea to add “memory” clobber into asm and recheck DDG
in this test on aarch64 with both SMS and sched-deps patches applied.
I'll check it.


Roman

Reply via email to