Re: [Qemu-devel] [RFC v8 00/14] Slow-path for atomic instruction translation

Sergey Fedorov Thu, 09 Jun 2016 04:47:03 -0700

Hi,

On 19/04/16 16:39, Alvise Rigo wrote:
> This patch series provides an infrastructure for atomic instruction
> implementation in QEMU, thus offering a 'legacy' solution for
> translating guest atomic instructions. Moreover, it can be considered as
> a first step toward a multi-thread TCG.
>
> The underlying idea is to provide new TCG helpers (sort of softmmu
> helpers) that guarantee atomicity to some memory accesses or in general
> a way to define memory transactions.
>
> More specifically, the new softmmu helpers behave as LoadLink and
> StoreConditional instructions, and are called from TCG code by means of
> target specific helpers. This work includes the implementation for all
> the ARM atomic instructions, see target-arm/op_helper.c.


I think that is a generally good idea to provide LL/SC TCG operations
for emulating guest atomic instruction behaviour as those operations
allow to implement other atomic primitives such as copmare-and-swap and
atomic arithmetic easily. Another advantage of these operations is that
they are free from ABA problem.

> The implementation heavily uses the software TLB together with a new
> bitmap that has been added to the ram_list structure which flags, on a
> per-CPU basis, all the memory pages that are in the middle of a LoadLink
> (LL), StoreConditional (SC) operation.  Since all these pages can be
> accessed directly through the fast-path and alter a vCPU's linked value,
> the new bitmap has been coupled with a new TLB flag for the TLB virtual
> address which forces the slow-path execution for all the accesses to a
> page containing a linked address.

But I'm afraid we've got a scalability problem using software TLB engine
heavily. This approach relies on TLB flush of all CPUs which is not very
cheap operation. That is going to be even more expansive in case of
MTTCG as you need to exit the CPU execution loop in order to avoid
deadlocks.

I see you try mitigate this issue by introducing a history of N last
pages touched by an exclusive access. That would work fine avoiding
excessive TLB flushes as long as the current working set of exclusively
accessed pages does not go beyond N. Once we exceed this limit we'll get
a global TLB flush on most LL operations. I'm afraid we can get dramatic
performance decrease as guest code implements finer-grained locking
scheme. I would like to emphasise that performance can degrade sharply
and dramatically as soon as the limit gets exceeded. How could we tackle
this problem?

Kind regards,
Sergey

Re: [Qemu-devel] [RFC v8 00/14] Slow-path for atomic instruction translation

Reply via email to