Peter Maydell <peter.mayd...@linaro.org> writes:
> On Mon, 30 Sep 2019 at 14:17, Doug Gale <doug...@gmail.com> wrote: >> >> I found a lockup in single threaded TCG, with OVMF bios, 16 CPUs. >> >> qemu-system-x86_64 --accel tcg,thread=single -smp cpus=16 -bios >> /usr/share/ovmf/OVMF.fd >> >> Using Ubuntu 18.04 LTS, default gnome desktop. There is no guest OS, >> there is no hard drive, just the OVMF firmware locks it up. (I >> narrowed it down to this from a much larger repro) > >> Peter Maydell helped me bisect it in IRC. >> Works fine at commit 1e8a98b53867f61 >> Fails at commit 9458a9a1df1a4c7 >> >> MTTCG works fine all the way up to master. > > Thanks for this bug report. I've reproduced it and think > I have figured out what is going on here. > > Commit 9458a9a1df1a4c719e245 is Paolo's commit that fixes the > TCG-vs-dirty-bitmap race by having the thread which is > doing a memory access wait for the vCPU thread to finish > its current TB using a no-op run_on_cpu() operation. > > In the case of the hang the thread doing the memory access > is the iothread, like this: > > #14 0x000055c150c0a98c in run_on_cpu (cpu=0x55c153801c60, > func=0x55c150bbb542 <do_nothing>, data=...) > at /home/petmay01/linaro/qemu-from-laptop/qemu/cpus.c:1205 > #15 0x000055c150bbb58c in tcg_log_global_after_sync > (listener=0x55c1538410c8) at > /home/petmay01/linaro/qemu-from-laptop/qemu/exec.c:2963 > #16 0x000055c150c1fe18 in memory_global_after_dirty_log_sync () at > /home/petmay01/linaro/qemu-from-laptop/qemu/memory.c:2598 > #17 0x000055c150c1e6b8 in memory_region_snapshot_and_clear_dirty > (mr=0x55c1541e82b0, addr=0, size=1920000, client=0) > at /home/petmay01/linaro/qemu-from-laptop/qemu/memory.c:2106 > #18 0x000055c150c76c05 in vga_draw_graphic (s=0x55c1541e82a0, full_update=0) > at /home/petmay01/linaro/qemu-from-laptop/qemu/hw/display/vga.c:1661 > #19 0x000055c150c771c4 in vga_update_display (opaque=0x55c1541e82a0) > at /home/petmay01/linaro/qemu-from-laptop/qemu/hw/display/vga.c:1785 > #20 0x000055c151052a83 in graphic_hw_update (con=0x55c1536dfaa0) at > /home/petmay01/linaro/qemu-from-laptop/qemu/ui/console.c:268 > #21 0x000055c151091490 in gd_refresh (dcl=0x55c1549af090) at > /home/petmay01/linaro/qemu-from-laptop/qemu/ui/gtk.c:484 > #22 0x000055c151056571 in dpy_refresh (s=0x55c1542f9d90) at > /home/petmay01/linaro/qemu-from-laptop/qemu/ui/console.c:1629 > #23 0x000055c1510527f0 in gui_update (opaque=0x55c1542f9d90) at > /home/petmay01/linaro/qemu-from-laptop/qemu/ui/console.c:206 > #24 0x000055c1511ee67c in timerlist_run_timers > (timer_list=0x55c15370c280) at > /home/petmay01/linaro/qemu-from-laptop/qemu/util/qemu-timer.c:592 > #25 0x000055c1511ee726 in qemu_clock_run_timers > (type=QEMU_CLOCK_REALTIME) at > /home/petmay01/linaro/qemu-from-laptop/qemu/util/qemu-timer.c:606 > #26 0x000055c1511ee9e5 in qemu_clock_run_all_timers () at > /home/petmay01/linaro/qemu-from-laptop/qemu/util/qemu-timer.c:692 > #27 0x000055c1511ef181 in main_loop_wait (nonblocking=0) at > /home/petmay01/linaro/qemu-from-laptop/qemu/util/main-loop.c:524 > #28 0x000055c150db03fe in main_loop () at > /home/petmay01/linaro/qemu-from-laptop/qemu/vl.c:1806 > > run_on_cpu() adds the do_nothing function to a cpu queued-work list. > > However, the single-threaded TCG runloop qemu_tcg_rr_cpu_thread_fn() > has the basic structure: > > while (1) { > for (each vCPU) { > run this vCPU for a timeslice; > } > qemu_tcg_rr_wait_io_event(); > } > > and it processes cpu work queues only in qemu_tcg_rr_wait_io_event(). > This is a problem, because the thing which causes us to stop running > one vCPU when its timeslice ends and move on to the next is the > tcg_kick_vcpu_timer -- and the iothread will never process that timer > and kick the vcpu because it is currently blocked in run_on_cpu() ! > > Not sure currently what the best fix is here. We seem to be repeating ourselves because: a8efa60633575a2ee4dbf807a71cb44d44b0e0f8 Author: Paolo Bonzini <pbonz...@redhat.com> AuthorDate: Wed Nov 14 12:36:57 2018 +0100 cpus: run work items for all vCPUs if single-threaded However looking at the commit I can still see we have the problem of not advancing to the next vCPU if the kick timer (or some other event) doesn't bring the execution to an exit. I suspect you could get this in Linux but it's probably sufficiently busy to ensure vCPUs are always exiting for some reason or another. So options I can think of so far are: 1. Kick 'em all when not inter-vCPU Something like this untested patch... --8<---------------cut here---------------start------------->8--- 1 file changed, 17 insertions(+), 5 deletions(-) cpus.c | 22 +++++++++++++++++----- modified cpus.c @@ -949,8 +949,8 @@ static inline int64_t qemu_tcg_next_kick(void) return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD; } -/* Kick the currently round-robin scheduled vCPU */ -static void qemu_cpu_kick_rr_cpu(void) +/* Kick the currently round-robin scheduled vCPU to next */ +static void qemu_cpu_kick_rr_next_cpu(void) { CPUState *cpu; do { @@ -961,6 +961,16 @@ static void qemu_cpu_kick_rr_cpu(void) } while (cpu != atomic_mb_read(&tcg_current_rr_cpu)); } +/* Kick all RR vCPUs */ +static void qemu_cpu_kick_rr_cpus(void) +{ + CPUState *cpu; + + CPU_FOREACH(cpu) { + cpu_exit(cpu); + }; +} + static void do_nothing(CPUState *cpu, run_on_cpu_data unused) { } @@ -993,7 +1003,7 @@ void qemu_timer_notify_cb(void *opaque, QEMUClockType type) static void kick_tcg_thread(void *opaque) { timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick()); - qemu_cpu_kick_rr_cpu(); + qemu_cpu_kick_rr_next_cpu(); } static void start_tcg_kick_timer(void) @@ -1828,9 +1838,11 @@ void qemu_cpu_kick(CPUState *cpu) { qemu_cond_broadcast(cpu->halt_cond); if (tcg_enabled()) { + if (qemu_tcg_mttcg_enabled()) { cpu_exit(cpu); - /* NOP unless doing single-thread RR */ - qemu_cpu_kick_rr_cpu(); + } else { + qemu_cpu_kick_rr_cpus(); + } } else { if (hax_enabled()) { /* --8<---------------cut here---------------end--------------->8--- 2. Add handling of kicking all VCPUs to do_run_on_cpu when current_cpu == NULL Which would basically kick all vCPUs when events come from outside the emulation. 3. Figure out multi-threaded icount and record/replay and drop the special RR case. This might take a while. > Side note -- this use of run_on_cpu() means that we now drop > the iothread lock within memory_region_snapshot_and_clear_dirty(), > which we didn't before. This means that a vCPU thread can now > get in and execute an access to the device registers of > hw/display/vga.c, updating its state in a way I suspect that the > device model code is not expecting... So maybe the right answer > here should be to come up with a fix for the race that 9458a9a1 > addresses that doesn't require us to drop the iothread lock in > memory_region_snapshot_and_clear_dirty() ? Alternatively we need > to audit the callers and flag in the documentation that this > function has the unexpected side effect of briefly dropping the > iothread lock. There was a series Emilio posted to get rid of more of the BQL in place of per-CPU locks which IIRC also stopped some of the bouncing we do in the *_on_cpu functions. Each time we have to do a lock shuffle to get things moving is a bit of a red flag. > > thanks > -- PMM -- Alex Bennée