On Thu, 2021-03-04 at 11:15 +0100, Stefano Garzarella wrote:
> On Wed, Mar 03, 2021 at 02:07:24PM +0200, Maxim Levitsky wrote:
> > On Tue, 2021-03-02 at 15:52 +0100, Stefano Garzarella wrote:
> > > On Mon, Mar 01, 2021 at 02:56:40PM +0200, Maxim Levitsky wrote:
> > > > On Mon, 2021-03-01 at 12:17 +0100, Paolo Bonzini wrote:
> > > > > If kvm_arch_remove_sw_breakpoint finds that a software breakpoint 
> > > > > does not
> > > > > have an INT3 instruction, it fails.  This can happen if one sets a
> > > > > software breakpoint in a kernel module and then reloads it.  gdb then
> > > > > thinks the breakpoint cannot be deleted and there is no way to add it
> > > > > back.
> > > > > 
> > > > > Suggested-by: Maxim Levitsky <mlevi...@redhat.com>
> > > > > Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> > > > > ---
> > > > >  target/i386/kvm/kvm.c | 9 +++++++--
> > > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > > > > index 0b5755e42b..c8d61daf68 100644
> > > > > --- a/target/i386/kvm/kvm.c
> > > > > +++ b/target/i386/kvm/kvm.c
> > > > > @@ -4352,8 +4352,13 @@ int kvm_arch_remove_sw_breakpoint(CPUState 
> > > > > *cs, struct kvm_sw_breakpoint *bp)
> > > > >  {
> > > > >      uint8_t int3;
> > > > > 
> > > > > -    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0) || int3 != 0xcc 
> > > > > ||
> > > > > -        cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 
> > > > > 1, 1)) {
> > > > > +    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0)) {
> > > > > +        return -EINVAL;
> > > > > +    }
> > > > > +    if (int3 != 0xcc) {
> > > > > +        return 0;
> > > > > +    }
> > > > > +    if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 
> > > > > 1, 1)) {
> > > > >          return -EINVAL;
> > > > >      }
> > > > >      return 0;
> > > > 
> > > > There still remains a philosopical question if 
> > > > kvm_arch_remove_sw_breakpoint
> > > > should always return 0, since for the usual case of kernel debugging 
> > > > where
> > > > a breakpoint is in unloaded module, the above will probably still fail
> > > > as paging for this module is removed as well by the kernel.
> > > > It is still better though so:
> > > > 
> > > > Reviewed-by: Maxim Levitsky <mlevi...@redhat.com>
> > > > 
> > > > Note that I managed to make lx-symbols to work in a very stable way
> > > > with attached WIP patch I hacked on this Sunday.
> > > > I will send a cleaned up version of it to upstream when I have time.
> > > > 
> > > > Since I make gdb unload the symbols, it works even without this patch.
> > > > 
> > > > Added Stefano Garzarella to CC as I know that he tried to make this 
> > > > work as well.
> > > > https://lkml.org/lkml/2020/10/5/514
> > > 
> > > Thanks Maxim for CCing me!
> > > 
> > > Just a report when I tried these patches, but I'm not sure they are
> > > related.
> > > 
> > > I found that gdb 10 has some problem with QEMU:
> > > 
> > >      $ gdb --version
> > >      GNU gdb (GDB) Fedora 10.1-2.fc33
> > > 
> > >      (gdb) lx-symbols
> > >      loading vmlinux
> > >      scanning for modules in linux/build
> > >      ../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.
> > > 
> > > With gdb 9 'lx-symbols' works well, but I still have some issue when I
> > > put a breakpoint to a symbol not yet loaded (vsock_core_register in this
> > > example), then I load the module (vsock_loopback in this example) in the
> > > guest.
> > > 
> > > Whit your patch gdb stuck after loading the symbols of the first new
> > > module:
> > >      (gdb) b vsock_core_register
> > >      Function "vsock_core_register" not defined.
> > >      Make breakpoint pending on future shared library load? (y or [n]) y
> > >      Breakpoint 1 (vsock_core_register) pending.
> > >      (gdb) c
> > >      Continuing.
> > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > > 
> > > Without your patch, gdb loops infinitely reloading the new module:
> > >      refreshing all symbols to reload module 'vsock'
> > >      loading vmlinux
> > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > >      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
> > >      loading @0xffffffffc007e000: linux/build/net/bridge/bridge.ko
> > >      loading @0xffffffffc0077000: linux/build/net/802/stp.ko
> > >      loading @0xffffffffc0007000: linux/build/net/llc/llc.ko
> > >      loading @0xffffffffc0013000: linux/build/net/sunrpc/sunrpc.ko
> > >      loading @0xffffffffc000d000: 
> > > linux/build/net/ipv4/netfilter/ip_tables.ko
> > >      loading @0xffffffffc0000000: linux/build/net/netfilter/x_tables.ko
> > >      refreshing all symbols to reload module 'vsock'
> > >      loading vmlinux
> > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > >      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
> > >      ...
> > > 
> > > I'll try to get a better look at what's going on.
> > 
> > Let me then explain what I found:
> > 
> > First of all initial execution of lx-symbols works and always worked for me
> > (I use gdb 9 though from fedora 32 (9.1-7.fc32))
> > 
> > 
> > Then a special breakpoint is placed on do_init_module
> > (it is hidden from the user)
> > 
> > Once it is hit two things can happen:
> > 
> > 1. if a not yet seen module is loaded,
> >   (module that wasn't loaded last time all symbols were reloaded)
> >   its symbols are loaded to gdb with 'add-symbol-file' command.
> > 
> > 2. if module that was already loaded to gdb, is loaded (see above),
> > then 'big hammer' is used:
> > 
> > a. all existing breakpoints (including the hidden one)
> >    are disabled since reloading everything
> >    indeed messes up the gdb state
> > 
> > b. the executable's symbols (the kernel) are reloaded,
> >    which makes gdb unload all symbols, and then all symbols are loaded
> >    again (in the same way as lx-symbols works),
> >    including the symbols of currently loading module.
> > 
> > c. all breakpoints are enabled again
> > 
> > 
> > Now the issue that you originally reported on LKML is because the (1)
> > apparently also messes up the software breakpoints state in gdb,
> > and that makes gdb to not to temporary remove the breakpoint
> > in do_init_module once the execution is resumed, and then
> > the guest starts executing garbage (bytes after 'int3' instruction).
> > 
> > The second issue is that (2), aka the big hammer isn't really needed.
> > GDB does have (maybe this is recent command?) a 'remove-symbol-file'
> > command.
> > 
> > So it is possible to do remove-symbol-file/add-symbol-file'
> > on known module reload instead of reloading everything.
> > 
> > But this has a small issue. The issue is that such known module
> > was already reloaded, so all int3 instructions that gdb placed into
> > it are already gone, so sofware breakpoints placed to it won't work
> > This is what the patch that Paulo sent fixes.
> > 
> > However it is even better to create another hidden breakpoint on module 
> > removal
> > path (I used free_module for that) and unload the symbols there.
> > This allows the gdb to cleanly remove all software breakpoints
> > in that module, show them again as pending, and even re-install
> > them again once the module is loaded again.
> > 
> > So those are the two changes I made:
> > 
> > 1. I added a breakpoint on module removal which also does
> >  a. disable all breakpoints
> >  b. unload the module's symbols
> >  c. enable all breakpoints
> > 
> > 2. On module load I also do
> >  a. disable all breakpoints
> >  b. load the module's symbols
> >  c. enable all breakpoints
> > 
> > 
> > There is still an issue that both 'load' and 'unload' breakpoint
> > can hit more that once.
> > This happens because these are software breakpoints and
> > load/unload code in the kernel is executed with interrupts enabled.
> > 
> > So what is happening is that once the hidden breakpoint is hit, gdb script
> > attached to it is done, and VM is resumed, gdb does more or less this:
> > 
> > a. remove the breakpoint
> > b. do a single step
> > c. re-install the breakpoint.
> > 
> > However the single step more often than not, lands us into an interrupt
> > handler, and so once the handler is finished we end up re-executing the
> > instruction on which breakpoint was set.
> > On a single vCPU it works more or less (with several tries) on my machine,
> > but with many vCPUs, I can end up in live lock like state
> > when the above prevents the VM from progressing.
> > 
> > I think we can fix this on kvm side by not injecting interrupts
> > when single step is done.
> 
> Thank you so much for this detailed description, very much appreciated!
> 
> > In fact the below patch works for me,
> > Not only it fixes the live lock but makes these hidden breakpoints
> > hit only once in my testing.
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index eec62c0dafc36..8b7a4e27bcf66 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -8501,6 +8501,12 @@ static void inject_pending_event(struct kvm_vcpu 
> > *vcpu, bool *req_immediate_exit
> >                        goto busy;
> >        }
> > 
> > +       /*
> > +        * Don't inject interrupts while single stepping to make guest 
> > debug easier
> > +        */
> > +       if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> > +               can_inject = false;
> > +
> > 
> > With this patch lx-symbols works almost perfectly for me (on AMD).
> 
> I'll try this patch!

Note that the patch to disable interrupt injection while single
stepping is wrong, due to some whatever mistake I made while rebasing things.
I attached an updated version.

With it (and the patch to lx-symbols itself which should be applied to the 
guest kernel),
I was able to run lx-symbols very well on both intel and AMD.

Note that I compile the guest kernel in the guest and install, but then I copy 
it
to the host, and I run the gdb from there. This is my debug.sh script:


vm adm hmp "gdbserver tcp::2345"
gdb ~/Kernel/vm-fedora/src/vmlinux \
        -ex "target remote :2345" \
        -ex "cd /home/mlevitsk/Kernel/vm-fedora/src" \
        -ex "lx-symbols" \
        -ex "cont"

Note that the debug must start after guest kernel was loaded.

Best regards,
        Maxim Levitsky


> 
> > The only remaing issue (that might be not easy to fix)
> > is that I still can't place breakpoints in __init module code.
> > That code is physically removed from the kernel once the module is done 
> > loading,
> > and it seems that its debug info isn't correct to even make a hardware
> > breakpoint work on it.
> > (gdb shows very small addresses for these symbols).
> > 
> > As for why this doesn't work for you I have 3 theories:
> > 
> > 1. The whole 'reload symbols on breakpoint' is forbidden
> > by gdb in the manual, that is one of the reasons that I had
> > to disable software breakpoints prior to doing this.
> > There might be other things that break in different gdb versions.
> > I don't see a way to make it work without doing this.
> 
> Maybe this can also be the reason why gdb 10 doesn't work for me.
> I should investigate further.
> 
> > 2. Maybe you test that on Intel and something is broken there
> >   on KVM level (I tested only AMD).
> 
> Yes, I'm on Intel.
> 
> > 3. Maybe you debug a nested guest? I haven't tested if guest debug
> > works fine in this configuration.
> 
> Nope, I'm not debugging nested guest in that case.
> 
> Thanks for your help,
> Stefano
> 

From c0fe19355f5c99bd8d1e6c5345fb32f259be8131 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevi...@redhat.com>
Date: Sun, 28 Feb 2021 23:52:08 +0200
Subject: [PATCH 1/2] gdb: rework gdb debug script

Signed-off-by: Maxim Levitsky <mlevi...@redhat.com>
---
 scripts/gdb/linux/symbols.py | 96 ++++++++++++++++++++++++++----------
 1 file changed, 71 insertions(+), 25 deletions(-)

diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
index 1be9763cf8bb2..13f21afc2059d 100644
--- a/scripts/gdb/linux/symbols.py
+++ b/scripts/gdb/linux/symbols.py
@@ -17,6 +17,24 @@ import re
 
 from linux import modules, utils
 
+def save_state():
+        breakpoints = []
+        if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
+            for bp in gdb.breakpoints():
+                breakpoints.append({'breakpoint': bp, 'enabled': bp.enabled})
+                bp.enabled = False
+
+        show_pagination = gdb.execute("show pagination", to_string=True)
+        pagination = show_pagination.endswith("on.\n")
+        gdb.execute("set pagination off")
+
+        return {"breakpoints":breakpoints, "show_pagination": show_pagination}
+
+def load_state(state):
+    for breakpoint in state["breakpoints"]:
+        breakpoint['breakpoint'].enabled = breakpoint['enabled']
+    gdb.execute("set pagination %s" % ("on" if state["show_pagination"] else "off"))
+
 
 if hasattr(gdb, 'Breakpoint'):
     class LoadModuleBreakpoint(gdb.Breakpoint):
@@ -30,26 +48,38 @@ if hasattr(gdb, 'Breakpoint'):
             module_name = module['name'].string()
             cmd = self.gdb_command
 
+            # module already loaded, false alarm
+            if module_name in cmd.loaded_modules:
+                return False
+
             # enforce update if object file is not found
             cmd.module_files_updated = False
 
             # Disable pagination while reporting symbol (re-)loading.
             # The console input is blocked in this context so that we would
             # get stuck waiting for the user to acknowledge paged output.
-            show_pagination = gdb.execute("show pagination", to_string=True)
-            pagination = show_pagination.endswith("on.\n")
-            gdb.execute("set pagination off")
+            state = save_state()
+            cmd.load_module_symbols(module)
+            load_state(state)
+            return False
 
-            if module_name in cmd.loaded_modules:
-                gdb.write("refreshing all symbols to reload module "
-                          "'{0}'\n".format(module_name))
-                cmd.load_all_symbols()
-            else:
-                cmd.load_module_symbols(module)
+    class UnLoadModuleBreakpoint(gdb.Breakpoint):
+        def __init__(self, spec, gdb_command):
+            super(UnLoadModuleBreakpoint, self).__init__(spec, internal=True)
+            self.silent = True
+            self.gdb_command = gdb_command
 
-            # restore pagination state
-            gdb.execute("set pagination %s" % ("on" if pagination else "off"))
+        def stop(self):
+            module = gdb.parse_and_eval("mod")
+            module_name = module['name'].string()
+            cmd = self.gdb_command
 
+            if not module_name in cmd.loaded_modules:
+                return False
+
+            state = save_state()
+            cmd.unload_module_symbols(module)
+            load_state(state)
             return False
 
 
@@ -64,8 +94,9 @@ lx-symbols command."""
     module_paths = []
     module_files = []
     module_files_updated = False
-    loaded_modules = []
+    loaded_modules = {}
     breakpoint = None
+    ubreakpoint = None
 
     def __init__(self):
         super(LxSymbols, self).__init__("lx-symbols", gdb.COMMAND_FILES,
@@ -129,21 +160,32 @@ lx-symbols command."""
                 filename=module_file,
                 addr=module_addr,
                 sections=self._section_arguments(module))
+
             gdb.execute(cmdline, to_string=True)
-            if module_name not in self.loaded_modules:
-                self.loaded_modules.append(module_name)
+            self.loaded_modules[module_name] = {"module_file": module_file,
+                                                "module_addr": module_addr}
         else:
             gdb.write("no module object found for '{0}'\n".format(module_name))
 
+    def unload_module_symbols(self, module):
+        module_name = module['name'].string()
+
+        module_file = self.loaded_modules[module_name]["module_file"]
+        module_addr = self.loaded_modules[module_name]["module_addr"]
+
+        gdb.write("unloading @{addr}: {filename}\n".format(
+            addr=module_addr, filename=module_file))
+        cmdline = "remove-symbol-file {filename}".format(
+            filename=module_file)
+
+        gdb.execute(cmdline, to_string=True)
+        del self.loaded_modules[module_name]
+
+
     def load_all_symbols(self):
         gdb.write("loading vmlinux\n")
 
-        # Dropping symbols will disable all breakpoints. So save their states
-        # and restore them afterward.
-        saved_states = []
-        if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
-            for bp in gdb.breakpoints():
-                saved_states.append({'breakpoint': bp, 'enabled': bp.enabled})
+        state = save_state()
 
         # drop all current symbols and reload vmlinux
         orig_vmlinux = 'vmlinux'
@@ -153,15 +195,14 @@ lx-symbols command."""
         gdb.execute("symbol-file", to_string=True)
         gdb.execute("symbol-file {0}".format(orig_vmlinux))
 
-        self.loaded_modules = []
+        self.loaded_modules = {}
         module_list = modules.module_list()
         if not module_list:
             gdb.write("no modules found\n")
         else:
             [self.load_module_symbols(module) for module in module_list]
 
-        for saved_state in saved_states:
-            saved_state['breakpoint'].enabled = saved_state['enabled']
+        load_state(state)
 
     def invoke(self, arg, from_tty):
         self.module_paths = [os.path.expanduser(p) for p in arg.split()]
@@ -177,8 +218,13 @@ lx-symbols command."""
             if self.breakpoint is not None:
                 self.breakpoint.delete()
                 self.breakpoint = None
-            self.breakpoint = LoadModuleBreakpoint(
-                "kernel/module.c:do_init_module", self)
+            self.breakpoint = LoadModuleBreakpoint("kernel/module.c:do_init_module", self)
+
+            if self.ubreakpoint is not None:
+                self.ubreakpoint.delete()
+                self.ubreakpoint = None
+            self.ubreakpoint = UnLoadModuleBreakpoint("kernel/module.c:free_module", self)
+
         else:
             gdb.write("Note: symbol update on module loading not supported "
                       "with this gdb version\n")
-- 
2.26.2

From 8619f7c836156b5704c32aca7b7808ef344989c3 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevi...@redhat.com>
Date: Wed, 3 Mar 2021 13:05:34 +0200
Subject: [PATCH 2/2] KVM: x86: Guest debug: don't inject interrupts while
 single stepping

Signed-off-by: Maxim Levitsky <mlevi...@redhat.com>
---
 arch/x86/kvm/x86.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b140fce829286..ab3a598520ff4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8542,6 +8542,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
 			goto busy;
 	}
 
+	/*
+	 * Don't inject interrupts while single stepping to make guest debug easier
+	 */
+	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
+		return;
+
 	/*
 	 * Finally, inject interrupt events.  If an event cannot be injected
 	 * due to architectural conditions (e.g. IF=0) a window-open exit
-- 
2.26.2

Reply via email to