Emilio G. Cota <c...@braap.org> writes: > On Wed, Aug 03, 2016 at 22:02:04 +0100, Alex Bennée wrote: >> Emilio G. Cota <c...@braap.org> writes: >> >> > On Tue, Aug 02, 2016 at 18:27:42 +0100, Alex Bennée wrote: > (snip) >> >> +void wait_safe_cpu_work(void) >> >> +{ >> >> + while (can_wait_for_safe() && atomic_mb_read(&safe_work_pending) > >> >> 0) { >> > >> > The atomic here is puzzling, see below. >> > >> >> + /* >> >> + * If there is pending safe work and no pending threads we >> >> + * need to signal another thread to start its work. >> >> + */ >> >> + if (tcg_pending_threads == 0) { >> >> + qemu_cond_signal(&qemu_exclusive_cond); >> >> + } >> >> + qemu_cond_wait(&qemu_safe_work_cond, qemu_get_cpu_work_mutex()); >> >> + } >> >> +} >> >> >> >> static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi) >> >> { >> >> @@ -91,9 +121,18 @@ static void queue_work_on_cpu(CPUState *cpu, struct >> >> qemu_work_item *wi) >> >> cpu->queued_work_last = wi; >> >> wi->next = NULL; >> >> wi->done = false; >> >> + if (wi->safe) { >> >> + atomic_inc(&safe_work_pending); >> >> + } >> > >> > This doesn't seem right. Operating on the condvar's shared 'state' variable >> > should always be done with the condvar's mutex held. Otherwise, there's >> > no guarantee that sleepers will always see a consistent state when they're >> > woken up, which can easily lead to deadlock. >> >> How so? Surely the barriers around the atomic accesses and the implicit >> barriers of the mutexes ensure it is? > > Barriers guarantee that accesses will be perceived in the right order. > However, they do not determine *when* those accesses will be seen by > other CPUs. For that we need stronger primitives, i.e. atomics like > the ones embedded in locks. Otherwise we might end up with races that > are very hard to debug.
But that's what we have, atomics incs followed by a kick (cpu_exit/signal conds) to wake up threads. FWIW I did fix a problem with the MTTCG logic last night to do with sleeping: https://github.com/stsquad/qemu/commit/2f4f3ed149cfe87794a3bd4adfccf04b4cd81873 > >> > I suspect this is what caused the deadlock you saw in the last iteration >> > of the series. >> > >> > An additional requirement is the fact that new CPUs can come anytime in >> > user-mode (imagine we're flushing the TB while a new pthread was just >> > spawned). This is easily triggered by greatly reducing the size of the >> > translation buffer, and spawning dozens of threads. >> >> I don't suppose you have a test case written up for this already? >> >> My kvm-unit-tests are fairly extensive for SoftMMU mode but for >> user-mode I was only using pigz with the TB buffer scaled down. >> Obviously I need to expand the user-mode testing. > > A tiny TB buffer (redefining the constants in translate-all.c), plus > a program running under linux-user that spawns many threads that do actual > work is a good test. Yeah this is what the pigz test does. I made the buffer very small and had several clashing flush events but it was all stable. > >> > A possible fix is to sched safe work after exiting the CPU loop, i.e. >> > with qemu_get_cpu_work_mutex held. I tried this on v4 of this patchset >> > and doesn't scale very well on 64 cores (too much contention >> > on tb_lock), although at least it doesn't deadlock. >> >> Where exactly? Surely tb_lock contention shouldn't be a problem as it is >> only held for generation and patching now? > > Booting up 64 cores on x86_64 can show contention on a 64-core host, > since CPU kicks are frequent. Do you have this v5 + mttcg + cmpxchg in a > branch so that I can test? You can have async-work-v5 + base-patches-v4-wip: https://github.com/stsquad/qemu/commits/mttcg/base-patches-v4-04082016-for-emilio Last time I experimented with merging your cmpxchg work it went in without any conflicts so hopefully that is the same now. This tree currently runs all of the kvm-unit-tests for ARMv7 and v8 in -accel tcg,thread=single mode and runs the tcg and tlbflush test groups in -accel tcg,thread=multi mode. It looks like I have some silly compile errors to fix for Travis to be happy though. >> > An alternative is to have a separate lock for safe work, and check for >> > safe work once there are no other locks held; a good place to do this is >> > at the beginning of cpu_loop_exec. This scales better, and I'd argue >> > it's simpler. In fact, I posted a patch that does this about a year >> > ago (!): >> > https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02576.html >> >> I'll have another look at this. One thing I prefer about this series is >> it keeps all the work mechanisms together. I think that's worth striving >> for if we can. > > Sure. I don't think that old patchset has much value apart from raising > some issues that aren't mentioned in this series. > > By the way before even considering the merge of this patchset, I think > we should look at merging first the cmpxchg work (at least for x86) > so that we can thoroughly test this set at least with linux-user. Otherwise > we'll see errors/segfaults and we won't know what caused them. Yes I reckon the cmpxchg and barrier work should get merged first as it is both testable and worthwhile for the existing linux-user case. If we are happy async-work is fit for purpose in SoftMMU mode as well I would expect that to get merged ahead of the main base enabling set. In fact the base patches have pulled a bunch of the cputlb patches from the ARM enabling series I think we are pretty close to base-patches-v4 + any functioning atomic fix = support for all platforms ;-) There are a few wrinkles that need to be checked out for each platform though so I still expect we'll be enabling combos as we go. This mainly has to do with things like booting up cpus in a MTTCG safe way. -- Alex Bennée