Hi, Segher > > > > In pr84524.c we got a loop with an extended inline asm: > > > > asm volatile ("" : "+r" (v)) > > > > which also gives us a “surprising” situation Alexander predicts. > > > > > > > > For sched-deps scanner such volatile asm is a “true barrier” and we > > > > create dependencies to almost all other instructions, while DF scanners > > > > don’t give us such information. > > > > > > There is no such thing as a "true barrier" in extended asm. The only > > > thing > > > volatile asm means is that the asm has a side effect unknown to the > > > compiler; > > > this can *not* be a modification of a register or of memory contents, such > > > things are known by the compiler, that's what clobbers and "memory" > > > clobber > > > are about. > > > > In sched-deps.c we got: > > case ASM_OPERANDS: > > case ASM_INPUT: > > { > > /* Traditional and volatile asm instructions must be considered to > > use > > and clobber all hard registers, all pseudo-registers and all of > > memory. So must TRAP_IF and UNSPEC_VOLATILE operations. > > > > Consider for instance a volatile asm that changes the fpu rounding > > mode. An insn should not be moved across this even if it only > > uses > > pseudo-regs because it might give an incorrectly rounded result. > > */ > > if ((code != ASM_OPERANDS || MEM_VOLATILE_P (x)) > > && !DEBUG_INSN_P (insn)) > > reg_pending_barrier = TRUE_BARRIER; > > ... > > This code was added in 1997 (r14770). In 2004 the documentation was > changed to clarify how things really work (r88999): > > "Note that even a volatile @code{asm} instruction can be moved relative to > other code, including across jump instructions." > > (followed by an example exactly about what this means for FPU control).
Thanks for pointing to that changes! Unfortunately, sched-deps.c was more conservative this 15 years... Let’s try to fix it. > > > > Maybe it is a good idea somehow re-implement modulo scheduler using only > > > > one scanner instead of two, but at the moment the best thing to do is to > > > > detect the situation earlier and skip such loops. > > > > > > Or fix the broken code... > > > > I’m not sure what you mean here, > > I mean have the modulo scheduler implement the correct asm semantics, not > some more restrictive thing that gets it into conflicts with DF, etc. > > I don't think this will turn out to be a problem in any way. Some invalid > asm will break, sure. I have started with applying this without any SMS changes: diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c --- a/gcc/sched-deps.c +++ b/gcc/sched-deps.c @@ -2753,22 +2753,14 @@ sched_analyze_2 (struct deps_desc *deps, rtx x, rtx_insn *insn) case UNSPEC_VOLATILE: flush_pending_lists (deps, insn, true, true); + + if (!DEBUG_INSN_P (insn)) + reg_pending_barrier = TRUE_BARRIER; /* FALLTHRU */ case ASM_OPERANDS: case ASM_INPUT: { - /* Traditional and volatile asm instructions must be considered to use - and clobber all hard registers, all pseudo-registers and all of - memory. So must TRAP_IF and UNSPEC_VOLATILE operations. - - Consider for instance a volatile asm that changes the fpu rounding - mode. An insn should not be moved across this even if it only uses - pseudo-regs because it might give an incorrectly rounded result. */ - if ((code != ASM_OPERANDS || MEM_VOLATILE_P (x)) - && !DEBUG_INSN_P (insn)) - reg_pending_barrier = TRUE_BARRIER; - /* For all ASM_OPERANDS, we must traverse the vector of input operands. We cannot just fall through here since then we would be confused by the ASM_INPUT rtx inside ASM_OPERANDS, which do not indicate Regstrapping it on x86-64 shows some failures. First is with ms-sysv abi test and can be solved like this: diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc @@ -511,7 +511,7 @@ void make_do_test (const vector<class arg> &args, /* End if init_test call. */ if (f.get_realign () && unaligned == 1) - out << " __asm__ __volatile__ (\"subq $8,%%rsp\":::\"cc\");" + out << " __asm__ __volatile__ (\"subq $8,%%rsp\":::\"cc\", \"memory\");" << endl; out << " ret = do_test_" @@ -525,7 +525,7 @@ void make_do_test (const vector<class arg> &args, out << ");" << endl; if (f.get_realign () && unaligned == 1) - out << " __asm__ __volatile__ (\"addq $8,%%rsp\":::\"cc\");" + out << " __asm__ __volatile__ (\"addq $8,%%rsp\":::\"cc\", \"memory\");" << endl; out << " check_results (ret);" << endl; Here we have some asms which control stack pointer (sigh!). It certainly may be broken at any moment, but right now “memory” clobber fixes everything for me. Another failure is: FAIL: gcc.dg/guality/pr58791-4.c -O2 -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i == 486 FAIL: gcc.dg/guality/pr58791-4.c -O2 -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i2 == 487 FAIL: gcc.dg/guality/pr58791-4.c -O3 -g -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i == 486 FAIL: gcc.dg/guality/pr58791-4.c -O3 -g -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i2 == 487 FAIL: gcc.dg/guality/pr58791-4.c -Os -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i == 486 FAIL: gcc.dg/guality/pr58791-4.c -Os -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i2 == 487 FAIL: gcc.dg/guality/pr58791-4.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i == 486 FAIL: gcc.dg/guality/pr58791-4.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i2 == 487 FAIL: gcc.dg/guality/pr58791-4.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i == 486 FAIL: gcc.dg/guality/pr58791-4.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i2 == 487 It is connected to debug-info, and I cannot solve it myself. I am not sure how this should work when we try to print dead-code variable (i2) while debugging -O2 (O3/Os) compiled executable. Jakub created that test, he is in CC already. I will also look at aarch64 regstrap a bit later. But that job should also be done for all other targets. Segher, could you test it on rs6000? > > > > In pr84524.c we got a loop with an extended inline asm: > > > > asm volatile ("" : "+r" (v)) It seems a good idea to add “memory” clobber into asm and recheck DDG in this test on aarch64 with both SMS and sched-deps patches applied. I'll check it. Roman