Hi, On 19/04/16 16:39, Alvise Rigo wrote: > This patch series provides an infrastructure for atomic instruction > implementation in QEMU, thus offering a 'legacy' solution for > translating guest atomic instructions. Moreover, it can be considered as > a first step toward a multi-thread TCG. > > The underlying idea is to provide new TCG helpers (sort of softmmu > helpers) that guarantee atomicity to some memory accesses or in general > a way to define memory transactions. > > More specifically, the new softmmu helpers behave as LoadLink and > StoreConditional instructions, and are called from TCG code by means of > target specific helpers. This work includes the implementation for all > the ARM atomic instructions, see target-arm/op_helper.c.
I think that is a generally good idea to provide LL/SC TCG operations for emulating guest atomic instruction behaviour as those operations allow to implement other atomic primitives such as copmare-and-swap and atomic arithmetic easily. Another advantage of these operations is that they are free from ABA problem. > The implementation heavily uses the software TLB together with a new > bitmap that has been added to the ram_list structure which flags, on a > per-CPU basis, all the memory pages that are in the middle of a LoadLink > (LL), StoreConditional (SC) operation. Since all these pages can be > accessed directly through the fast-path and alter a vCPU's linked value, > the new bitmap has been coupled with a new TLB flag for the TLB virtual > address which forces the slow-path execution for all the accesses to a > page containing a linked address. But I'm afraid we've got a scalability problem using software TLB engine heavily. This approach relies on TLB flush of all CPUs which is not very cheap operation. That is going to be even more expansive in case of MTTCG as you need to exit the CPU execution loop in order to avoid deadlocks. I see you try mitigate this issue by introducing a history of N last pages touched by an exclusive access. That would work fine avoiding excessive TLB flushes as long as the current working set of exclusively accessed pages does not go beyond N. Once we exceed this limit we'll get a global TLB flush on most LL operations. I'm afraid we can get dramatic performance decrease as guest code implements finer-grained locking scheme. I would like to emphasise that performance can degrade sharply and dramatically as soon as the limit gets exceeded. How could we tackle this problem? Kind regards, Sergey