Re: [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334]

Jason Merrill via Gcc-patches Thu, 25 Mar 2021 14:44:18 -0700

On 3/24/21 5:31 AM, Jakub Jelinek wrote:

Hi!


On the testcase in the PR with
-fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing 
-mstackrealign
we have prologue:
0000000000000000 <_func_with_dwarf_issue_>:
    0:  4c 8d 54 24 08          lea    0x8(%rsp),%r10
    5:  48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
    9:  41 ff 72 f8             pushq  -0x8(%r10)
    d:  55                      push   %rbp
    e:  48 89 e5                mov    %rsp,%rbp
   11:  41 57                   push   %r15
   13:  41 56                   push   %r14
   15:  41 55                   push   %r13
   17:  41 54                   push   %r12
   19:  41 52                   push   %r10
   1b:  53                      push   %rbx
   1c:  48 83 ec 20             sub    $0x20,%rsp
and emit
00000000 0000000000000014 00000000 CIE
   Version:               1
   Augmentation:          "zR"
   Code alignment factor: 1
   Data alignment factor: -8
   Return address column: 16
   Augmentation data:     1b
   DW_CFA_def_cfa: r7 (rsp) ofs 8
   DW_CFA_offset: r16 (rip) at cfa-8
   DW_CFA_nop
   DW_CFA_nop

00000018 0000000000000044 0000001c FDE cie=00000000 
pc=0000000000000000..00000000000001d5
   DW_CFA_advance_loc: 5 to 0000000000000005
   DW_CFA_def_cfa: r10 (r10) ofs 0
   DW_CFA_advance_loc: 9 to 000000000000000e
   DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
   DW_CFA_advance_loc: 13 to 000000000000001b
   DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
   DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
   DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
   DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24)
   DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32)
...
unwind info for that.  The problem is when async signal
(or stepping through in the debugger) stops after the pushq %rbp
instruction and before movq %rsp, %rbp, the unwind info says that
caller's %rbp is saved there at *%rbp, but that is not true, caller's
%rbp is either still available in the %rbp register, or in *%rsp,
only after executing the next instruction - movq %rsp, %rbp - the
location for %rbp is correct.  So, either we'd need to temporarily
say:
   DW_CFA_advance_loc: 9 to 000000000000000e
   DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0)
   DW_CFA_advance_loc: 3 to 0000000000000011
   DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
   DW_CFA_advance_loc: 10 to 000000000000001b
or to me it seems more compact to just say:
   DW_CFA_advance_loc: 12 to 0000000000000011
   DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
   DW_CFA_advance_loc: 10 to 000000000000001b

The following patch implements the latter.  It keeps the call/jump and
flush queue handling as is and only moves the clobbers_queued_reg_save
forced flushes from before the instruction to immediately after the first
instruction.  My reading of libgcc unwinder is that while for calls it
executes cfi instruction until < pc (but pc is the end of call anyway),
for async signal stops it executes cfi instruction until <= pc, so it
will only omit cfi instructions starting with advance to a higher loc.
On the testcase from the PR, the difference with the patch is:
@@ -16,9 +16,9 @@ Contents of the .eh_frame section:
  00000018 0000000000000044 0000001c FDE cie=00000000 
pc=0000000000000000..00000000000001d5
    DW_CFA_advance_loc: 5 to 0000000000000005
    DW_CFA_def_cfa: r10 (r10) ofs 0
-  DW_CFA_advance_loc: 9 to 000000000000000e
+  DW_CFA_advance_loc: 12 to 0000000000000011
    DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
-  DW_CFA_advance_loc: 13 to 000000000000001b
+  DW_CFA_advance_loc: 10 to 000000000000001b
    DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
    DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
    DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
When looking at other changes, e.g. on cc1plus itself when it has
identical objdump -dr, the without patch to with patch difference
is e.g.:
  0005f388 000000000000002c 0005f38c FDE cie=00000000 
pc=0000000000e2fc80..0000000000e2fe8d
    DW_CFA_advance_loc: 1 to 0000000000e2fc81
    DW_CFA_def_cfa_offset: 16
    DW_CFA_offset: r6 (rbp) at cfa-16
    DW_CFA_advance_loc: 3 to 0000000000e2fc84
    DW_CFA_def_cfa_register: r6 (rbp)
-  DW_CFA_advance_loc: 6 to 0000000000e2fc8a
+  DW_CFA_advance_loc: 9 to 0000000000e2fc8d
    DW_CFA_offset: r15 (r15) at cfa-24
    DW_CFA_offset: r14 (r14) at cfa-32
    DW_CFA_offset: r13 (r13) at cfa-40
-  DW_CFA_advance_loc: 6 to 0000000000e2fc90
+  DW_CFA_advance_loc: 5 to 0000000000e2fc92
    DW_CFA_offset: r12 (r12) at cfa-48
    DW_CFA_offset: r3 (rbx) at cfa-56
-  DW_CFA_advance_loc2: 399 to 0000000000e2fe1f
+  DW_CFA_advance_loc2: 397 to 0000000000e2fe1f
    DW_CFA_remember_state
    DW_CFA_def_cfa: r7 (rsp) ofs 8
    DW_CFA_advance_loc: 1 to 0000000000e2fe20
    DW_CFA_restore_state
    DW_CFA_nop
    DW_CFA_nop
for
0000000000e2fc80 <_ZL13result_vectoriP7rtx_def>:
   e2fc80:       55                      push   %rbp
   e2fc81:       48 89 e5                mov    %rsp,%rbp
   e2fc84:       41 57                   push   %r15
   e2fc86:       41 56                   push   %r14
   e2fc88:       41 55                   push   %r13
   e2fc8a:       45 31 ed                xor    %r13d,%r13d
   e2fc8d:       41 54                   push   %r12
   e2fc8f:       53                      push   %rbx
   e2fc90:       31 db                   xor    %ebx,%ebx
   e2fc92:       48 81 ec 98 02 00 00    sub    $0x298,%rsp
   e2fc99:       89 7d c4                mov    %edi,-0x3c(%rbp)
...
prologue, i.e. the change is whether DW_CFA_offset: r13 (r13) at cfa-40
starts applying before or after the xorl %r13d,%r13d instruction
and whether DW_CFA_offset: r3 (rbx) at cfa-56 starts applying
before or after the xorl %ebx,%ebx instruction.
Or:
-00061590 0000000000000020 00061594 FDE cie=00000000 
pc=0000000000e3d160..0000000000e3d1af
+00061590 000000000000001c 00061594 FDE cie=00000000 
pc=0000000000e3d160..0000000000e3d1af
    DW_CFA_advance_loc: 1 to 0000000000e3d161
    DW_CFA_def_cfa_offset: 16
    DW_CFA_offset: r6 (rbp) at cfa-16
    DW_CFA_advance_loc: 9 to 0000000000e3d16a
    DW_CFA_def_cfa_register: r6 (rbp)
-  DW_CFA_advance_loc: 2 to 0000000000e3d16c
+  DW_CFA_advance_loc: 5 to 0000000000e3d16f
    DW_CFA_offset: r12 (r12) at cfa-24
-  DW_CFA_advance_loc1: 66 to 0000000000e3d1ae
+  DW_CFA_advance_loc: 63 to 0000000000e3d1ae
    DW_CFA_def_cfa: r7 (rsp) ofs 8
-  DW_CFA_nop
-  DW_CFA_nop
-  DW_CFA_nop
for:
0000000000e3d160 <_Z23builtin_memset_read_strPvl15scalar_int_mode>:
   e3d160:       55                      push   %rbp
   e3d161:       48 63 c2                movslq %edx,%rax
   e3d164:       49 89 f8                mov    %rdi,%r8
   e3d167:       48 89 e5                mov    %rsp,%rbp
   e3d16a:       41 54                   push   %r12
   e3d16c:       49 89 c4                mov    %rax,%r12
   e3d16f:       48 83 ec 08             sub    $0x8,%rsp
again, whether DW_CFA_offset: r12 (r12) at cfa-24 starts applying
before movq %rax,%r12 instruction or after it.
I think if for some instructions it is possible that debugger or signal
would stop in the middle of instruction, with some side-effects of the
instruction already done and others not yet, then we can't do that,
but at the start of instructions that modify the register the side-effects
of the instruction should not have taken effect yet.

Bootstrapped/regtested on x86_64-linux and i686-linux.

LGTM. This seems consistent with when we flush due to clobbers indwarf2out_frame_debug.

Will e.g. GDB be happy about the changes?


I would expect so, but it would be good to have someone from GDB verify.

Thinking some more about this, what can be problematic on the GCC side
is inline asm, that can and often will contain multiple instructions.
For that is an easy fix, just test asm_noperands and handle
clobbers_queued_reg_save before the insn rather than after in that case.


Sure, but does inline asm go through dwarf2out_frame_debug?

But there is another problem, instruction patterns that emit multiple
hw instructions, code can stop in between them.  So, do we want some
instruction attribute that would conservatively tell us which instruction
patterns are guaranteed to be single machine instructions?

It seems to me that in that situation you'd want to add the save to*%rsp, and still not update to *%rbp until after the combined instruction.

Incidentally, it seems a bit odd that clobbers_queued_reg_save returnstrue both for clobbering the saved register and for clobbering the savelocation; if I already know that %rbp is saved at *%rsp, why do I carethat the insn we're looking at clobbers %rbp? Maybe we should onlycheck modified_in_p (q->reg, insn) if q->reg doesn't appear inregs_saved_in_regs. That wouldn't help with this testcase, though.

2021-03-24  Jakub Jelinek  <ja...@redhat.com>

        PR debug/99334
        * dwarf2cfi.c (scan_trace): For clobbers_queued_reg_save (insn)
        forced flushes of queued register saves for non-call/jump insns
        that can't throw, emit them first after the insn rather than
        before it.

--- gcc/dwarf2cfi.c.jj  2021-03-02 11:25:47.217727061 +0100
+++ gcc/dwarf2cfi.c     2021-03-23 17:34:58.240281522 +0100
@@ -2705,12 +2705,15 @@ scan_trace (dw_trace_info *trace, bool e
              dwarf2out_flush_queued_reg_saves ();
            }
          else if (!NONJUMP_INSN_P (insn)
-                  || clobbers_queued_reg_save (insn)
                   || find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL))
            dwarf2out_flush_queued_reg_saves ();
          any_cfis_emitted = false;

add_cfi_insn = insn;

+
+         if (queued_reg_saves.length () && clobbers_queued_reg_save (insn))
+           dwarf2out_flush_queued_reg_saves ();
+
          scan_insn_after (insn);
          control = insn;
        }


        Jakub

Re: [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334]

Reply via email to