Re: [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu()

Alex Bennée Wed, 03 Aug 2016 14:04:40 -0700

Emilio G. Cota <c...@braap.org> writes:

> On Tue, Aug 02, 2016 at 18:27:42 +0100, Alex Bennée wrote:
>> From: Sergey Fedorov <serge.f...@gmail.com>
>>
>> This patch is based on the ideas found in work of KONRAD Frederic [1],
>> Alex Bennée [2], and Alvise Rigo [3].
>>
>> This mechanism allows to perform an operation safely in a quiescent
>> state. Quiescent state means: (1) no vCPU is running and (2) BQL in
>> system-mode or 'exclusive_lock' in user-mode emulation is held while
>> performing the operation. This functionality is required e.g. for
>> performing translation buffer flush safely in multi-threaded user-mode
>> emulation.
>>
>> The existing CPU work queue is used to schedule such safe operations. A
>> new 'safe' flag is added into struct qemu_work_item to designate the
>> special requirements of the safe work. An operation in a quiescent sate
>
> s/sate/state/
>
> (snip)
>> index a233f01..6d5da15 100644
>> --- a/cpu-exec-common.c
>> +++ b/cpu-exec-common.c
>> @@ -25,6 +25,7 @@
>>
>>  bool exit_request;
>>  CPUState *tcg_current_cpu;
>> +int tcg_pending_threads;
>>
>>  /* exit the current TB, but without causing any exception to be raised */
>>  void cpu_loop_exit_noexc(CPUState *cpu)
>> @@ -79,6 +80,35 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
>>  }
>>
>>  QemuCond qemu_work_cond;
>> +QemuCond qemu_safe_work_cond;
>> +QemuCond qemu_exclusive_cond;
>> +
>> +static int safe_work_pending;
>> +
>> +#ifdef CONFIG_USER_ONLY
>> +#define can_wait_for_safe() (1)
>> +#else
>> +/*
>> + * We never sleep in SoftMMU emulation because we would deadlock as
>> + * all vCPUs are in the same thread. This will change for MTTCG
>> + * however.
>> + */
>> +#define can_wait_for_safe() (0)
>> +#endif
>> +
>> +void wait_safe_cpu_work(void)
>> +{
>> +    while (can_wait_for_safe() && atomic_mb_read(&safe_work_pending) > 0) {
>
> The atomic here is puzzling, see below.
>
>> +        /*
>> +         * If there is pending safe work and no pending threads we
>> +         * need to signal another thread to start its work.
>> +         */
>> +        if (tcg_pending_threads == 0) {
>> +            qemu_cond_signal(&qemu_exclusive_cond);
>> +        }
>> +        qemu_cond_wait(&qemu_safe_work_cond, qemu_get_cpu_work_mutex());
>> +    }
>> +}
>>
>>  static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>>  {
>> @@ -91,9 +121,18 @@ static void queue_work_on_cpu(CPUState *cpu, struct 
>> qemu_work_item *wi)
>>      cpu->queued_work_last = wi;
>>      wi->next = NULL;
>>      wi->done = false;
>> +    if (wi->safe) {
>> +        atomic_inc(&safe_work_pending);
>> +    }
>
> This doesn't seem right. Operating on the condvar's shared 'state' variable
> should always be done with the condvar's mutex held. Otherwise, there's
> no guarantee that sleepers will always see a consistent state when they're
> woken up, which can easily lead to deadlock.


How so? Surely the barriers around the atomic accesses and the implicit
barriers of the mutexes ensure it is?

>
> I suspect this is what caused the deadlock you saw in the last iteration
> of the series.
>
> An additional requirement is the fact that new CPUs can come anytime in
> user-mode (imagine we're flushing the TB while a new pthread was just
> spawned). This is easily triggered by greatly reducing the size of the
> translation buffer, and spawning dozens of threads.

I don't suppose you have a test case written up for this already?

My kvm-unit-tests are fairly extensive for SoftMMU mode but for
user-mode I was only using pigz with the TB buffer scaled down.
Obviously I need to expand the user-mode testing.

> This patch, as it
> stands, won't catch the new threads coming in, because at the time
> "safe work" was assigned, the new threads might not be seen by
> CPU_FOREACH (btw, the CPU list should be converted to RCU, but a
> ppc machine might be affected, see [1])

That sounds like a separate patch to RCUify.

>
> A possible fix is to sched safe work after exiting the CPU loop, i.e.
> with qemu_get_cpu_work_mutex held. I tried this on v4 of this patchset
> and doesn't scale very well on 64 cores (too much contention
> on tb_lock), although at least it doesn't deadlock.

Where exactly? Surely tb_lock contention shouldn't be a problem as it is
only held for generation and patching now?

> An alternative is to have a separate lock for safe work, and check for
> safe work once there are no other locks held; a good place to do this is
> at the beginning of cpu_loop_exec. This scales better, and I'd argue
> it's simpler. In fact, I posted a patch that does this about a year
> ago (!):
>   https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02576.html

I'll have another look at this. One thing I prefer about this series is
it keeps all the work mechanisms together. I think that's worth striving
for if we can.

> Paolo didn't like condvars, but now I see them coming up again. I guess
> he still won't like the synchronize_rcu() call in there, and I don't like
> it either, but I don't think that's an essential part of that patch.
>
> Thanks,
>
>               Emilio
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02581.html


--
Alex Bennée

Re: [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu()

Reply via email to