Nabih Estefan <nabiheste...@google.com> writes:

> This patch series modifies the gdbstub to address a bug running a
> multi cluster machine in QEMU using TCG.

Was this a downstream multi-cluster machine? Do we have any examples for
upstream? It would be nice to add a gdbstub test case to cover the
multi-inferior behaviour.

> The machine where the
> problem was seen had several clusters of CPUs with similar
> architectures and similar memory layout all working with physical
> addresses. It was discovered under gdb debugging that a breakpoint
> targeting one cluster misfired on the wrong cluster quite frequently
> with no possible workaround since gdb was also unaware of any
> breakpoint in that cluster and simply reported SIGTRAP.
>
> The sequence that discovered the bug adds N inferiors and adds a
> breakpoint on inferior N. Then after continuing an unrelated thread
> stops the execution when its PC hits the same address of the break
> targeting a different inferior.
>
> target extended-remote :1234
> add-inferior
> inferior 2
> attach 2
> ...
> add-inferior
> inferior N
> attach N
> add-symbol-file /path/to/foo.elf
> break foo
>> Breakpoint 1 at 0xf00add
> info break
>> Num     Type           Disp Enb Address    What
>> 1       breakpoint     keep y   0x00f00add in foo
>>                                            at foo.c:1234 inf N
> continue
>> Continuing.
>>
>> Thread 2.1 received signal SIGTRAP, Trace/breakpoint trap.
>> [Switching to Thread 2.2]
>> 0xf00add in ?? ()
>
> The main problem is the current implementation of
> 'tcg_insert_breakpoint' and 'tcg_remove_breakpoint' insert and remove
> breakpoints to all the CPUs in the system regardless of what the
> remote gdb protocol implements.
>
> If we look at the current source code of GDB we can see that the
> function 'remote_target::insert_breakpoint' in file gdb/remote.c has
> the intention to select the process ID of the inferior where the
> break was inserted.
>
> int
> remote_target::insert_breakpoint (struct gdbarch *gdbarch,
>                                   struct bp_target_info *bp_tgt)
> {
> ...
>   /* Make sure the remote is pointing at the right process, if
>      necessary.  */
>   if (!gdbarch_has_global_breakpoints (current_inferior ()->arch ()))
>     set_general_process ();
> ...
> }
>
> https:sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/remote.c;h=2c3988cb5075655e8a799d1cc5d4760ad8ed426e;hb=HEAD#l11023
>
> This would not happen when we input the 'break' in gdb but it is
> deferred until the time we execute the 'continue' command. Because we
> might be currently selecting an inferior that is not the one where we
> previously set the breakpoint, the remote gdb has to make sure we
> move the focus to the process ID of the inferior where we inserted
> the break.
>
> In the previous example this will translate to something like:
>
> HgpN.M
> Z0,00f00add,4
>
> Another thing that is wrong with the current implementation (and it
> affects both TCG and KVM mode) is that the remote gdb protocol uses
> 'Hg' and not 'Hc' to select the process. Functions
> 'gdb_breakpoint_insert' and 'gdb_breakpoint_remove' receive wrongly
> 'gdbserver_state.c_cpu' instead of 'gdbserver_state.g_cpu'.
>
> This is supported by the documentation of 'H op thread-id' where op =
> 'c' is reserved to the step and continue:
>
> https:sourceware.org/gdb/current/onlinedocs/gdb.html/Packets.html
>
> And it can be verified that the function 'set_general_process' in the
> previous code snippet will eventually call
> 'remote_target::set_general_thread' and not
> 'remote_target::set_continue_thread' if it needs to change focus.
>
> A third scenario that has to be taken into account is the case of a
> break on a specific thread, for instance the sequence:
>
> inferior 1
> break bar thread 1.3
> break bar thread 1.4
>
> The remote protocol expects the gdbstub to apply the break to the
> process ID of inferior 1 and considers the specific thread-id as a
> breakpoint condition (not too different from a 'break if').
>
> In this case the packet exchange may look like:
>
> Hgp1.1
> Z0,00ba2add,4
>
> There wouldn't be an independent set of packets for 'Hgp1.3' and
> 'Hgp1.4'.
>
> In the gdb source code, the handling of the specific thread-id
> happens during breakpoint evaluation in function
> 'bpstat_check_breakpoint_conditions' of file gdb/breakpoint.c
>
> https:sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/breakpoint.c;h=17bd627f867cf3d4dc81322ed1919ba40cbb237d;hb=HEAD#l5550
>
> The proposed fix inserts or removes a breakpoint to all the cpus
> sharing the same process ID as the one selected with the latest
> received 'Hg' packet.
>
> Roque Arcudia Hernandez (2):
>   gdbstub: Fix wrong CPUState pointer in breakpoint functions
>   gdbstub: Apply breakpoints only to the selected PID
>
>  accel/tcg/tcg-accel-ops.c | 37 +++++++++++++++++++++++--------------
>  gdbstub/gdbstub.c         | 10 ++++++++--
>  gdbstub/internals.h       | 13 +++++++++++--
>  include/exec/gdbstub.h    | 12 ++++++++++++
>  4 files changed, 54 insertions(+), 18 deletions(-)

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Reply via email to