On Thu, Jul 08, 2021 at 02:49:51AM +0000, Wang, Wei W wrote: > On Thursday, July 8, 2021 12:44 AM, Peter Xu wrote: > > > > Not to mention the hard migration issues are mostly with non-idle > > > > guest, in that case having the balloon in the guest will be > > > > disastrous from this pov since it'll start to take mutex for each > > > > page, while balloon would hardly report anything valid since most guest > > > > pages > > are being used. > > > > > > If no pages are reported, migration thread wouldn't wait on the lock then. > > > > Yes I think this is the place I didn't make myself clear. It's not about > > sleeping, it's > > about the cmpxchg being expensive already when the vm is huge. > > OK. > How did you root cause that it's caused by cmpxchg, instead of lock > contention (i.e. syscall and sleep) or > some other code inside pthread_mutex_lock(). Do you have cycles about cmpxchg > v.s. cycles of pthread_mutex_lock()?
We've got "perf top -g" showing a huge amount of stacks lying in pthread_mutex_lock(). > > I check the implementation of pthread_mutex_lock(). The code path for lock > acquire is long. QemuSpin looks more efficient. > (probably we also don’t want migration thread to sleep in any case) I didn't check it, but I really hoped it should be very close to a spinlock version when it's not contended. We should be careful on using spin locks in userspace, e.g., with that moving clear log into critical section will be too much and actuall close to "wrong", imho, because the kvm memslot lock inside is sleepable. I think it's very fine to have migration thread sleep. IMHO we need explicit justification for each mutex to be converted to a spinlock in userspace. So far I don't see it yet for this bitmap lock. Frankly I also don't know how spinlock would work reliably without being able to disable scheduling, then could the thread got scheduled out duing taking the spinlock? Would another thread trying to take this lock spinning on this sleeping task? > > I think it's also better to see the comparison of migration throughput data > (i.e. pages per second) in the following cases, before we make a decision: > - per-page mutex > - per-page spinlock > - 50-ms mutex I don't have the machines, so I can't do this. I also see this as separate issues to solve to use spinlock, as I said before I would prefer some justification first to use it rather than blindly running tests and pick a patch with higher number. I also hope we can reach a consensus that we fix one issue at a time. This patch already proves itself with some real workloads, I hope we can merge it first, either with 50ms period or less. Per-page locking is already against at least my instinction of how this should be done; I just regreted I didn't see that an issue when reviewing the balloon patch as I offered the r-b myself. However I want to make it work as before then we figure out a better approach on how the lock is taken, and which lock we should use. I don't see it a must to do all things in one patch. Thanks, -- Peter Xu