Hello, this patch fixes a corner case bug in unwinding CR values. The underlying problem is as follows:
In order to save the CR in the prologue, two insns are necessary. The first is a mfcr that loads the CR value into a GPR, and the second is a store of the GPR to the stack. The back-end currently tags the first insn with a REG_FRAME_RELATED_EXPR, and marks both as RTX_FRAME_RELATED_P. This causes the middle-end to emit two CFI records, showing the CR saved first in the GPR, and later in its stack slot. (The first record is omitted if there is no possible exception in between.) Now, assume we use -fnon-call-exceptions, and get a signal on an instruction in between those two points, and the signal handler attempts to throw. The unwind logic will read CFI records and determine that in this frame, CR was saved into a GPR; and in the frame below (which must be a signal frame), that GPR was saved onto the stack. What the unwind logic then decides to do is to restore CR from the save slot of that latter GPR; i.e. the unwinder will copy that GPR save slot over the CR save slot of the unwind routine that calls __builtin_eh_return. Unfortunately, this is a problem on a 64-bit big-endian platform, since that GPR save slot is 8 bytes, while the CR save slot is just 4 bytes. The unwinder therefore copies the wrong half of the GPR save slot over the CR save slot ... As second, related, problem will come up with the new ABI. Here, we want to track each CR field in a separate CFI record, even though the values end up sharing the same stack slot. This is not a problem for a CFI record pointing to memory. However, the current dwarf2out.c middle-end logic is unable to handle the situation where multiple registers are saved in the same *register* at the same time ... Both problems can be fixed in the same way, by never emitting a CFI record showing CR saved into a GPR. This is easily done by simply marking only the store to memory as RTX_FRAME_RELATED_P. However, to avoid generating incorrect code we must then prevent and exception point to occur in between the two insns to save CR. The easiest way to do so is to mark the store to the stack slot as an additional user of all CR fields that need to be saved by the current function. This does restrict scheduling other instructions in between the prologue a bit more than otherwise. But since this affects only very special cases (e.g. an instruction that may triggers a floating- point exception, while -fnon-call-exceptions is active), this seems not too bad ... Note that in the epilogue, this is not really a problem, even though we also need two instructions to restore CR. We can (and do) simply leave the CFI record point to the stack slot all the time, which remains valid even after the CR value was read from there (and even after the stack pointer itself was modified). The patch below fixes this problem as described above, and adds a test case that fails without the patch. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand <ulrich.weig...@de.ibm.com> * config/rs6000/rs6000.c (rs6000_emit_prologue): Do not place a RTX_FRAME_RELATED_P marker on the UNSPEC_MOVESI_FROM_CR insn. Instead, add USEs of all modified call-saved CR fields to the insn storing the result to the stack slot, and provide an appropriate REG_FRAME_RELATED_EXPR for that insn. * config/rs6000/rs6000.md ("*crsave"): New insn pattern. * config/rs6000/predicates.md ("crsave_operation"): New predicate. testsuite/ChangeLog: 2013-11-11 Ulrich Weigand <ulrich.weig...@de.ibm.com> * g++.dg/eh/ppc64-sighandle-cr.C: New test. Index: gcc/gcc/config/rs6000/rs6000.c =================================================================== --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -21761,21 +21761,9 @@ rs6000_emit_prologue (void) && REGNO (frame_reg_rtx) != cr_save_regno && !(using_static_chain_p && cr_save_regno == 11)) { - rtx set; - cr_save_rtx = gen_rtx_REG (SImode, cr_save_regno); START_USE (cr_save_regno); - insn = emit_insn (gen_movesi_from_cr (cr_save_rtx)); - RTX_FRAME_RELATED_P (insn) = 1; - /* Now, there's no way that dwarf2out_frame_debug_expr is going - to understand '(unspec:SI [(reg:CC 68) ...] UNSPEC_MOVESI_FROM_CR)'. - But that's OK. All we have to do is specify that _one_ condition - code register is saved in this stack slot. The thrower's epilogue - will then restore all the call-saved registers. - We use CR2_REGNO (70) to be compatible with gcc-2.95 on Linux. */ - set = gen_rtx_SET (VOIDmode, cr_save_rtx, - gen_rtx_REG (SImode, CR2_REGNO)); - add_reg_note (insn, REG_FRAME_RELATED_EXPR, set); + emit_insn (gen_movesi_from_cr (cr_save_rtx)); } /* Do any required saving of fpr's. If only one or two to save, do @@ -22086,26 +22074,62 @@ rs6000_emit_prologue (void) rtx addr = gen_rtx_PLUS (Pmode, frame_reg_rtx, GEN_INT (info->cr_save_offset + frame_off)); rtx mem = gen_frame_mem (SImode, addr); - /* See the large comment above about why CR2_REGNO is used. */ - rtx magic_eh_cr_reg = gen_rtx_REG (SImode, CR2_REGNO); /* If we didn't copy cr before, do so now using r0. */ if (cr_save_rtx == NULL_RTX) { - rtx set; - START_USE (0); cr_save_rtx = gen_rtx_REG (SImode, 0); - insn = emit_insn (gen_movesi_from_cr (cr_save_rtx)); - RTX_FRAME_RELATED_P (insn) = 1; - set = gen_rtx_SET (VOIDmode, cr_save_rtx, magic_eh_cr_reg); - add_reg_note (insn, REG_FRAME_RELATED_EXPR, set); + emit_insn (gen_movesi_from_cr (cr_save_rtx)); } - insn = emit_move_insn (mem, cr_save_rtx); + + /* Saving CR requires a two-instruction sequence: one instruction + to move the CR to a general-purpose register, and a second + instruction that stores the GPR to memory. + + We do not emit any DWARF CFI records for the first of these, + because we cannot properly represent the fact that CR is saved in + a register. One reason is that we cannot express that multiple + CR fields are saved; another reason is that on 64-bit, the size + of the CR register in DWARF (4 bytes) differs from the size of + a general-purpose register. + + This means if any intervening instruction were to clobber one of + the call-saved CR fields, we'd have incorrect CFI. To prevent + this from happening, we mark the store to memory as a use of + those CR fields, which prevents any such instruction from being + scheduled in between the two instructions. */ + rtx crsave_v[9]; + int n_crsave = 0; + int i; + + crsave_v[n_crsave++] = gen_rtx_SET (VOIDmode, mem, cr_save_rtx); + for (i = 0; i < 8; i++) + if (save_reg_p (CR0_REGNO + i)) + crsave_v[n_crsave++] + = gen_rtx_USE (VOIDmode, gen_rtx_REG (CCmode, CR0_REGNO + i)); + + insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, + gen_rtvec_v (n_crsave, crsave_v))); END_USE (REGNO (cr_save_rtx)); - rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off, - NULL_RTX, NULL_RTX); + /* Now, there's no way that dwarf2out_frame_debug_expr is going to + understand '(unspec:SI [(reg:CC 68) ...] UNSPEC_MOVESI_FROM_CR)', + so we need to construct a frame expression manually. */ + RTX_FRAME_RELATED_P (insn) = 1; + + /* Update address to be stack-pointer relative, like + rs6000_frame_related would do. */ + addr = gen_rtx_PLUS (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM), + GEN_INT (info->cr_save_offset + sp_off)); + mem = gen_frame_mem (SImode, addr); + + /* We still cannot express that multiple CR fields are saved in the + CR save slot. By convention, we use a single CR regnum to represent + the fact that all call-saved CR fields are saved. We use CR2_REGNO + to be compatible with gcc-2.95 on Linux. */ + rtx set = gen_rtx_SET (VOIDmode, mem, gen_rtx_REG (SImode, CR2_REGNO)); + add_reg_note (insn, REG_FRAME_RELATED_EXPR, set); } /* Update stack and set back pointer unless this is V.4, Index: gcc/gcc/config/rs6000/rs6000.md =================================================================== --- gcc.orig/gcc/config/rs6000/rs6000.md +++ gcc/gcc/config/rs6000/rs6000.md @@ -15058,6 +15058,14 @@ "mfcr %0" [(set_attr "type" "mfcr")]) +(define_insn "*crsave" + [(match_parallel 0 "crsave_operation" + [(set (match_operand:SI 1 "memory_operand" "=m") + (match_operand:SI 2 "gpc_reg_operand" "r"))])] + "" + "stw %2,%1" + [(set_attr "type" "store")]) + (define_insn "*stmw" [(match_parallel 0 "stmw_operation" [(set (match_operand:SI 1 "memory_operand" "=m") Index: gcc/gcc/config/rs6000/predicates.md =================================================================== --- gcc.orig/gcc/config/rs6000/predicates.md +++ gcc/gcc/config/rs6000/predicates.md @@ -1538,6 +1538,26 @@ return 1; }) +;; Return 1 if OP is valid for crsave insn, known to be a PARALLEL. +(define_predicate "crsave_operation" + (match_code "parallel") +{ + int count = XVECLEN (op, 0); + int i; + + for (i = 1; i < count; i++) + { + rtx exp = XVECEXP (op, 0, i); + + if (GET_CODE (exp) != USE + || GET_CODE (XEXP (exp, 0)) != REG + || GET_MODE (XEXP (exp, 0)) != CCmode + || ! CR_REGNO_P (REGNO (XEXP (exp, 0)))) + return 0; + } + return 1; +}) + ;; Return 1 if OP is valid for lmw insn, known to be a PARALLEL. (define_predicate "lmw_operation" (match_code "parallel") Index: gcc/gcc/testsuite/g++.dg/eh/ppc64-sighandle-cr.C =================================================================== --- /dev/null +++ gcc/gcc/testsuite/g++.dg/eh/ppc64-sighandle-cr.C @@ -0,0 +1,54 @@ +// { dg-do run { target { powerpc64*-*-linux* } } } +// { dg-options "-fexceptions -fnon-call-exceptions" } + +#include <signal.h> +#include <stdlib.h> +#include <fenv.h> + +#define SET_CR(R,V) __asm__ __volatile__ ("mtcrf %0,%1" : : "n" (1<<(7-R)), "r" (V<<(4*(7-R))) : "cr" #R) +#define GET_CR(R) ({ int tmp; __asm__ __volatile__ ("mfcr %0" : "=r" (tmp)); (tmp >> 4*(7-R)) & 15; }) + +void sighandler (int signo, siginfo_t * si, void * uc) +{ + SET_CR(2, 3); + SET_CR(3, 2); + SET_CR(4, 1); + + throw 0; +} + +float test (float a, float b) __attribute__ ((__noinline__)); +float test (float a, float b) +{ + float x; + asm ("mtcrf %1,%2" : "=f" (x) : "n" (1 << (7-3)), "r" (0), "0" (b) : "cr3"); + return a / x; +} + +int main () +{ + struct sigaction sa; + int status; + + sa.sa_sigaction = sighandler; + sa.sa_flags = SA_SIGINFO; + + status = sigaction (SIGFPE, & sa, NULL); + + feenableexcept (FE_DIVBYZERO); + + SET_CR(2, 6); + SET_CR(3, 9); + SET_CR(4, 12); + + try { + test (1, 0); + } + catch (...) { + return GET_CR(2) != 6 || GET_CR(3) != 9 || GET_CR(4) != 12; + } + + return 1; +} + + -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com