There are some situations when this patch still doesn't help. I think this happens due to the race condition in qemu_tcg_rr_wait_io_event
static void qemu_tcg_rr_wait_io_event(CPUState *cpu) { while (all_cpu_threads_idle()) { stop_tcg_kick_timer(); qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex); } start_tcg_kick_timer(); qemu_wait_io_event_common(cpu); } all_cpu_threads_idle() returns true when there is no queued work. But between this call and qemu_cond_wait() iothread may add queued work and the vCPU thread will sleep infinitely. Does anyone have an idea how to fix this? Pavel Dovgalyuk > -----Original Message----- > From: Pavel Dovgalyuk [mailto:pavel.dovga...@ispras.ru] > Sent: Tuesday, July 03, 2018 11:53 AM > To: qemu-devel@nongnu.org > Cc: alex.ben...@linaro.org; pbonz...@redhat.com; > maria.klimushenk...@ispras.ru; > dovga...@ispras.ru; pavel.dovga...@ispras.ru > Subject: [PATCH] replay: wake up vCPU when replaying > > In record/replay icount mode vCPU thread and iothread synchronize > the execution using the checkpoints. > vCPU thread processes the virtual timers and iothread processes all others. > When iothread wants to wake up sleeping vCPU thread, it sends dummy queued > work. Therefore it could be the following sequence of the events in > record mode: > - IO: sending dummy work > - IO: processing timers > - CPU: wakeup > - CPU: clearing dummy work > - CPU: processing virtual timers > > But due to the races in replay mode the sequence may change: > - IO: sending dummy work > - CPU: wakeup > - CPU: clearing dummy work > - CPU: sleeping again because nothing to do > - IO: Processing timers > - CPU: zzzz > > In this case vCPU will not wake up, because dummy work is not to be set up > again. > > This patch tries to wake up the vCPU when it sleeps and the icount warp > checkpoint isn't met. It means that vCPU has something to do, because > there are no other reasons of non-matching warp checkpoint. > > Signed-off-by: Pavel Dovgalyuk <pavel.dovga...@ispras.ru> > --- > cpus.c | 15 ++++++++++----- > 1 file changed, 10 insertions(+), 5 deletions(-) > > diff --git a/cpus.c b/cpus.c > index 181ce33..bad6a33 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -539,11 +539,6 @@ void qemu_start_warp_timer(void) > return; > } > > - /* warp clock deterministically in record/replay mode */ > - if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_START)) { > - return; > - } > - > if (!all_cpu_threads_idle()) { > return; > } > @@ -553,6 +548,16 @@ void qemu_start_warp_timer(void) > return; > } > > + /* warp clock deterministically in record/replay mode */ > + if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_START)) { > + /* vCPU is sleeping and warp can't be started. > + It is probably a race condition: notification sent > + to vCPU was processed in advance and vCPU went to sleep. > + Therefore we have to wake it up for doing someting. */ > + qemu_clock_notify(QEMU_CLOCK_VIRTUAL); > + return; > + } > + > /* We want to use the earliest deadline from ALL vm_clocks */ > clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT); > deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);