Eric Botcazou <ebotca...@adacore.com> writes: > There seems to be a sufficiently large consensus that volatile asms should > not > be treated as full optimization barriers by the compiler. This patch is a > first step towards implementing this conclusion and aims only at addressing > the code quality regressions introduced by > http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=193802 > on the 4.8 branch and mainline for volatile asms. > > It introduces a new, temporary predicate really_volatile_insn_p and invokes > it > from the 3 places in cse.c, cselib.c and dse.c which were touched above. But > this comes with a side effect: the "blockage" standard pattern needs to be > adjusted to always use an UNSPEC_VOLATILE. That's already the case for all > the architectures that define it, but 21 architectures (aarch64 arc avr bfin > cr16 cris epiphany h8300 m32c mcore mmix mn10300 moxie msp430 nds32 picochip > rl78 rx score v850 xtensa) don't define it and therefore need to be adjusted. > > Tested on x86_64-suse-linux and by building cc1 and compiling a relevant > testcase for the 21 aforementioned architectures.
Thanks for doing this. FWIW I agree it's probably the best stop-gap fix. But the implication seems to be that unspec_volatile and volatile asms are volatile in different ways. IMO they're volatile in the same way and the problems for volatile asms apply to unspec_volatile too. E.g. although cse.c will flush the table for unspec_volatile, it isn't the case that unspec_volatile forces a containing function to save all call-saved registers. That would be excessive for a plain blockage instruction. So again we seem to be assuming one thing in places like cse.c and another in the register allocator. Code that uses the DF framework will also assume that registers are not implicitly clobbered by an unspec_volatile: case ASM_OPERANDS: case UNSPEC_VOLATILE: case TRAP_IF: case ASM_INPUT: { /* Traditional and volatile asm instructions must be considered to use and clobber all hard registers, all pseudo-registers and all of memory. So must TRAP_IF and UNSPEC_VOLATILE operations. Consider for instance a volatile asm that changes the fpu rounding mode. An insn should not be moved across this even if it only uses pseudo-regs because it might give an incorrectly rounded result. However, flow.c's liveness computation did *not* do this, giving the reasoning as " ?!? Unfortunately, marking all hard registers as live causes massive problems for the register allocator and marking all pseudos as live creates mountains of uninitialized variable warnings." In order to maintain the status quo with regard to liveness and uses, we do what flow.c did and just mark any regs we can find in ASM_OPERANDS as used. In global asm insns are scanned and regs_asm_clobbered is filled out. For all ASM_OPERANDS, we must traverse the vector of input operands. We can not just fall through here since then we would be confused by the ASM_INPUT rtx inside ASM_OPERANDS, which do not indicate traditional asms unlike their normal usage. */ if (code == ASM_OPERANDS) { int j; for (j = 0; j < ASM_OPERANDS_INPUT_LENGTH (x); j++) df_uses_record (collection_rec, &ASM_OPERANDS_INPUT (x, j), DF_REF_REG_USE, bb, insn_info, flags); return; } break; } Also, ira-lives.c (which tracks the liveness of both pseudo and hard registers) doesn't mention "volatile" at all. So most passes assume that no pseudos or hard registers will be implicitly clobbered by unspec_volatile, just like for a volatile asm. And IMO that's right. I think the rule should be the same for volatile asms and unspec_volatiles, and the same for registers as it already is for memory: if the instruction clobbers something, it should say so explicitly. Volatile itself should: (a) prevent deletion or duplication of the operation (b) prevent reordering wrt other volatiles (c) prevent the operation from being considered equivalent to any other operation (even if it's structurally identical and has the same inputs) but nothing beyond that. So in terms of where we go from here, I'd hope we'd add something like fake hard registers to track any processor mode of interest, such as FP rounding mode once that is modelled properly. Then anything that relies on a specific mode would use that register and anything that changes the mode would set the register, meaning we have a proper dependency chain. I think we should do the same for whatever unspec_volatiles caused the RTL CSE & co. to be so conservative. Trying to leave it implicit seems too error-prone, because then you have to make sure that every rtl pass agrees on what the implicit behaviour is. Thanks, Richard