Am 20.02.2016 um 08:11 hat Pavel Dovgalyuk geschrieben: > > From: Pavel Dovgalyuk [mailto:dovga...@ispras.ru] > > > From: Kevin Wolf [mailto:kw...@redhat.com] > > > Am 16.02.2016 um 12:20 hat Pavel Dovgalyuk geschrieben: > > > > Coroutine Replay > > > > bool *done = req_replayed_list_get(reqid) // NULL > > > > co = > > > req_completed_list_get(e.reqid); // NULL > > > > > > There was no yield, this context switch is impossible to happen. Same > > > for the switch back. > > > > > > > req_completed_list_insert(reqid, qemu_coroutine_self()); > > > > qemu_coroutine_yield(); > > > > > > This is the point at which a context switch happens. The only other > > > point in my code is the qemu_coroutine_enter() in the other function. > > > > I've fixed aio_poll problem by disabling mutex lock for the > > replay_run_block_event() > > execution. Now virtual machine deterministically runs 4e8 instructions of > > Windows XP booting.
Are you sure that the lock was unnecessary? Solving deadlocks by removing the lock is a rather adventurous method. I assume that you're still replaying events from low-level functions like qemu_clock_get_ns(). So even if you get rid of the hangs, the result is probably not quite right. I'm afraid this is going in a direction where my comments can't be more constructive than a simple "you're doing it wrong". > > But then one non-deterministic event happens. > > Callback after finishing coroutine may be called from different contexts. How does this happen? I'm not aware of callbacks being processed by any thread other than the I/O thread for that specific block device (unless you use dataplane, this is the main loop thread). > > apic_update_irq() function behaves differently being called from vcpu and > > io threads. > > In one case it sets CPU_INTERRUPT_POLL and in other - nothing happens. > > Kevin, do you have some ideas how to fix this issue? > This happens because of coroutines may be assigned to different threads. > Maybe there is some way of making this assignment more deterministic? Coroutines aren't randomly assigned to threads, but threads actively enter coroutines. To my knowledge this happens only when starting a request (either vcpu or I/O thread; consistent per device) or by a callback when some event happens (only I/O thread). I can't see any non-determinism here. Kevin > > Therefore execution becomes non-deterministic. > > In previous version of the patch I solved this problem by linking block > > events to the > > execution checkpoints. IO thread have its own checkpoints and vcpu - its > > own. > > Therefore apic callbacks are always called from the same thread in replay > > as in recording > > phase. > > Pavel Dovgalyuk >