Emilio G. Cota <c...@braap.org> writes: > On Tue, Aug 02, 2016 at 18:27:42 +0100, Alex Bennée wrote: >> From: Sergey Fedorov <serge.f...@gmail.com> >> >> This patch is based on the ideas found in work of KONRAD Frederic [1], >> Alex Bennée [2], and Alvise Rigo [3]. >> >> This mechanism allows to perform an operation safely in a quiescent >> state. Quiescent state means: (1) no vCPU is running and (2) BQL in >> system-mode or 'exclusive_lock' in user-mode emulation is held while >> performing the operation. This functionality is required e.g. for >> performing translation buffer flush safely in multi-threaded user-mode >> emulation. >> >> The existing CPU work queue is used to schedule such safe operations. A >> new 'safe' flag is added into struct qemu_work_item to designate the >> special requirements of the safe work. An operation in a quiescent sate > > s/sate/state/ > > (snip) >> index a233f01..6d5da15 100644 >> --- a/cpu-exec-common.c >> +++ b/cpu-exec-common.c >> @@ -25,6 +25,7 @@ >> >> bool exit_request; >> CPUState *tcg_current_cpu; >> +int tcg_pending_threads; >> >> /* exit the current TB, but without causing any exception to be raised */ >> void cpu_loop_exit_noexc(CPUState *cpu) >> @@ -79,6 +80,35 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc) >> } >> >> QemuCond qemu_work_cond; >> +QemuCond qemu_safe_work_cond; >> +QemuCond qemu_exclusive_cond; >> + >> +static int safe_work_pending; >> + >> +#ifdef CONFIG_USER_ONLY >> +#define can_wait_for_safe() (1) >> +#else >> +/* >> + * We never sleep in SoftMMU emulation because we would deadlock as >> + * all vCPUs are in the same thread. This will change for MTTCG >> + * however. >> + */ >> +#define can_wait_for_safe() (0) >> +#endif >> + >> +void wait_safe_cpu_work(void) >> +{ >> + while (can_wait_for_safe() && atomic_mb_read(&safe_work_pending) > 0) { > > The atomic here is puzzling, see below. > >> + /* >> + * If there is pending safe work and no pending threads we >> + * need to signal another thread to start its work. >> + */ >> + if (tcg_pending_threads == 0) { >> + qemu_cond_signal(&qemu_exclusive_cond); >> + } >> + qemu_cond_wait(&qemu_safe_work_cond, qemu_get_cpu_work_mutex()); >> + } >> +} >> >> static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi) >> { >> @@ -91,9 +121,18 @@ static void queue_work_on_cpu(CPUState *cpu, struct >> qemu_work_item *wi) >> cpu->queued_work_last = wi; >> wi->next = NULL; >> wi->done = false; >> + if (wi->safe) { >> + atomic_inc(&safe_work_pending); >> + } > > This doesn't seem right. Operating on the condvar's shared 'state' variable > should always be done with the condvar's mutex held. Otherwise, there's > no guarantee that sleepers will always see a consistent state when they're > woken up, which can easily lead to deadlock.
How so? Surely the barriers around the atomic accesses and the implicit barriers of the mutexes ensure it is? > > I suspect this is what caused the deadlock you saw in the last iteration > of the series. > > An additional requirement is the fact that new CPUs can come anytime in > user-mode (imagine we're flushing the TB while a new pthread was just > spawned). This is easily triggered by greatly reducing the size of the > translation buffer, and spawning dozens of threads. I don't suppose you have a test case written up for this already? My kvm-unit-tests are fairly extensive for SoftMMU mode but for user-mode I was only using pigz with the TB buffer scaled down. Obviously I need to expand the user-mode testing. > This patch, as it > stands, won't catch the new threads coming in, because at the time > "safe work" was assigned, the new threads might not be seen by > CPU_FOREACH (btw, the CPU list should be converted to RCU, but a > ppc machine might be affected, see [1]) That sounds like a separate patch to RCUify. > > A possible fix is to sched safe work after exiting the CPU loop, i.e. > with qemu_get_cpu_work_mutex held. I tried this on v4 of this patchset > and doesn't scale very well on 64 cores (too much contention > on tb_lock), although at least it doesn't deadlock. Where exactly? Surely tb_lock contention shouldn't be a problem as it is only held for generation and patching now? > An alternative is to have a separate lock for safe work, and check for > safe work once there are no other locks held; a good place to do this is > at the beginning of cpu_loop_exec. This scales better, and I'd argue > it's simpler. In fact, I posted a patch that does this about a year > ago (!): > https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02576.html I'll have another look at this. One thing I prefer about this series is it keeps all the work mechanisms together. I think that's worth striving for if we can. > Paolo didn't like condvars, but now I see them coming up again. I guess > he still won't like the synchronize_rcu() call in there, and I don't like > it either, but I don't think that's an essential part of that patch. > > Thanks, > > Emilio > > [1] https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02581.html -- Alex Bennée