On 3/24/21 5:31 AM, Jakub Jelinek wrote:
Hi!
On the testcase in the PR with
-fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing
-mstackrealign
we have prologue:
0000000000000000 <_func_with_dwarf_issue_>:
0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10
5: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
9: 41 ff 72 f8 pushq -0x8(%r10)
d: 55 push %rbp
e: 48 89 e5 mov %rsp,%rbp
11: 41 57 push %r15
13: 41 56 push %r14
15: 41 55 push %r13
17: 41 54 push %r12
19: 41 52 push %r10
1b: 53 push %rbx
1c: 48 83 ec 20 sub $0x20,%rsp
and emit
00000000 0000000000000014 00000000 CIE
Version: 1
Augmentation: "zR"
Code alignment factor: 1
Data alignment factor: -8
Return address column: 16
Augmentation data: 1b
DW_CFA_def_cfa: r7 (rsp) ofs 8
DW_CFA_offset: r16 (rip) at cfa-8
DW_CFA_nop
DW_CFA_nop
00000018 0000000000000044 0000001c FDE cie=00000000
pc=0000000000000000..00000000000001d5
DW_CFA_advance_loc: 5 to 0000000000000005
DW_CFA_def_cfa: r10 (r10) ofs 0
DW_CFA_advance_loc: 9 to 000000000000000e
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
DW_CFA_advance_loc: 13 to 000000000000001b
DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24)
DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32)
...
unwind info for that. The problem is when async signal
(or stepping through in the debugger) stops after the pushq %rbp
instruction and before movq %rsp, %rbp, the unwind info says that
caller's %rbp is saved there at *%rbp, but that is not true, caller's
%rbp is either still available in the %rbp register, or in *%rsp,
only after executing the next instruction - movq %rsp, %rbp - the
location for %rbp is correct. So, either we'd need to temporarily
say:
DW_CFA_advance_loc: 9 to 000000000000000e
DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0)
DW_CFA_advance_loc: 3 to 0000000000000011
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
DW_CFA_advance_loc: 10 to 000000000000001b
or to me it seems more compact to just say:
DW_CFA_advance_loc: 12 to 0000000000000011
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
DW_CFA_advance_loc: 10 to 000000000000001b
The following patch implements the latter. It keeps the call/jump and
flush queue handling as is and only moves the clobbers_queued_reg_save
forced flushes from before the instruction to immediately after the first
instruction. My reading of libgcc unwinder is that while for calls it
executes cfi instruction until < pc (but pc is the end of call anyway),
for async signal stops it executes cfi instruction until <= pc, so it
will only omit cfi instructions starting with advance to a higher loc.
On the testcase from the PR, the difference with the patch is:
@@ -16,9 +16,9 @@ Contents of the .eh_frame section:
00000018 0000000000000044 0000001c FDE cie=00000000
pc=0000000000000000..00000000000001d5
DW_CFA_advance_loc: 5 to 0000000000000005
DW_CFA_def_cfa: r10 (r10) ofs 0
- DW_CFA_advance_loc: 9 to 000000000000000e
+ DW_CFA_advance_loc: 12 to 0000000000000011
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
- DW_CFA_advance_loc: 13 to 000000000000001b
+ DW_CFA_advance_loc: 10 to 000000000000001b
DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
When looking at other changes, e.g. on cc1plus itself when it has
identical objdump -dr, the without patch to with patch difference
is e.g.:
0005f388 000000000000002c 0005f38c FDE cie=00000000
pc=0000000000e2fc80..0000000000e2fe8d
DW_CFA_advance_loc: 1 to 0000000000e2fc81
DW_CFA_def_cfa_offset: 16
DW_CFA_offset: r6 (rbp) at cfa-16
DW_CFA_advance_loc: 3 to 0000000000e2fc84
DW_CFA_def_cfa_register: r6 (rbp)
- DW_CFA_advance_loc: 6 to 0000000000e2fc8a
+ DW_CFA_advance_loc: 9 to 0000000000e2fc8d
DW_CFA_offset: r15 (r15) at cfa-24
DW_CFA_offset: r14 (r14) at cfa-32
DW_CFA_offset: r13 (r13) at cfa-40
- DW_CFA_advance_loc: 6 to 0000000000e2fc90
+ DW_CFA_advance_loc: 5 to 0000000000e2fc92
DW_CFA_offset: r12 (r12) at cfa-48
DW_CFA_offset: r3 (rbx) at cfa-56
- DW_CFA_advance_loc2: 399 to 0000000000e2fe1f
+ DW_CFA_advance_loc2: 397 to 0000000000e2fe1f
DW_CFA_remember_state
DW_CFA_def_cfa: r7 (rsp) ofs 8
DW_CFA_advance_loc: 1 to 0000000000e2fe20
DW_CFA_restore_state
DW_CFA_nop
DW_CFA_nop
for
0000000000e2fc80 <_ZL13result_vectoriP7rtx_def>:
e2fc80: 55 push %rbp
e2fc81: 48 89 e5 mov %rsp,%rbp
e2fc84: 41 57 push %r15
e2fc86: 41 56 push %r14
e2fc88: 41 55 push %r13
e2fc8a: 45 31 ed xor %r13d,%r13d
e2fc8d: 41 54 push %r12
e2fc8f: 53 push %rbx
e2fc90: 31 db xor %ebx,%ebx
e2fc92: 48 81 ec 98 02 00 00 sub $0x298,%rsp
e2fc99: 89 7d c4 mov %edi,-0x3c(%rbp)
...
prologue, i.e. the change is whether DW_CFA_offset: r13 (r13) at cfa-40
starts applying before or after the xorl %r13d,%r13d instruction
and whether DW_CFA_offset: r3 (rbx) at cfa-56 starts applying
before or after the xorl %ebx,%ebx instruction.
Or:
-00061590 0000000000000020 00061594 FDE cie=00000000
pc=0000000000e3d160..0000000000e3d1af
+00061590 000000000000001c 00061594 FDE cie=00000000
pc=0000000000e3d160..0000000000e3d1af
DW_CFA_advance_loc: 1 to 0000000000e3d161
DW_CFA_def_cfa_offset: 16
DW_CFA_offset: r6 (rbp) at cfa-16
DW_CFA_advance_loc: 9 to 0000000000e3d16a
DW_CFA_def_cfa_register: r6 (rbp)
- DW_CFA_advance_loc: 2 to 0000000000e3d16c
+ DW_CFA_advance_loc: 5 to 0000000000e3d16f
DW_CFA_offset: r12 (r12) at cfa-24
- DW_CFA_advance_loc1: 66 to 0000000000e3d1ae
+ DW_CFA_advance_loc: 63 to 0000000000e3d1ae
DW_CFA_def_cfa: r7 (rsp) ofs 8
- DW_CFA_nop
- DW_CFA_nop
- DW_CFA_nop
for:
0000000000e3d160 <_Z23builtin_memset_read_strPvl15scalar_int_mode>:
e3d160: 55 push %rbp
e3d161: 48 63 c2 movslq %edx,%rax
e3d164: 49 89 f8 mov %rdi,%r8
e3d167: 48 89 e5 mov %rsp,%rbp
e3d16a: 41 54 push %r12
e3d16c: 49 89 c4 mov %rax,%r12
e3d16f: 48 83 ec 08 sub $0x8,%rsp
again, whether DW_CFA_offset: r12 (r12) at cfa-24 starts applying
before movq %rax,%r12 instruction or after it.
I think if for some instructions it is possible that debugger or signal
would stop in the middle of instruction, with some side-effects of the
instruction already done and others not yet, then we can't do that,
but at the start of instructions that modify the register the side-effects
of the instruction should not have taken effect yet.
Bootstrapped/regtested on x86_64-linux and i686-linux.
LGTM. This seems consistent with when we flush due to clobbers in
dwarf2out_frame_debug.
Will e.g. GDB be happy about the changes?
I would expect so, but it would be good to have someone from GDB verify.
Thinking some more about this, what can be problematic on the GCC side
is inline asm, that can and often will contain multiple instructions.
For that is an easy fix, just test asm_noperands and handle
clobbers_queued_reg_save before the insn rather than after in that case.
Sure, but does inline asm go through dwarf2out_frame_debug?
But there is another problem, instruction patterns that emit multiple
hw instructions, code can stop in between them. So, do we want some
instruction attribute that would conservatively tell us which instruction
patterns are guaranteed to be single machine instructions?
It seems to me that in that situation you'd want to add the save to
*%rsp, and still not update to *%rbp until after the combined instruction.
Incidentally, it seems a bit odd that clobbers_queued_reg_save returns
true both for clobbering the saved register and for clobbering the save
location; if I already know that %rbp is saved at *%rsp, why do I care
that the insn we're looking at clobbers %rbp? Maybe we should only
check modified_in_p (q->reg, insn) if q->reg doesn't appear in
regs_saved_in_regs. That wouldn't help with this testcase, though.
2021-03-24 Jakub Jelinek <ja...@redhat.com>
PR debug/99334
* dwarf2cfi.c (scan_trace): For clobbers_queued_reg_save (insn)
forced flushes of queued register saves for non-call/jump insns
that can't throw, emit them first after the insn rather than
before it.
--- gcc/dwarf2cfi.c.jj 2021-03-02 11:25:47.217727061 +0100
+++ gcc/dwarf2cfi.c 2021-03-23 17:34:58.240281522 +0100
@@ -2705,12 +2705,15 @@ scan_trace (dw_trace_info *trace, bool e
dwarf2out_flush_queued_reg_saves ();
}
else if (!NONJUMP_INSN_P (insn)
- || clobbers_queued_reg_save (insn)
|| find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL))
dwarf2out_flush_queued_reg_saves ();
any_cfis_emitted = false;
add_cfi_insn = insn;
+
+ if (queued_reg_saves.length () && clobbers_queued_reg_save (insn))
+ dwarf2out_flush_queued_reg_saves ();
+
scan_insn_after (insn);
control = insn;
}
Jakub