On 02/03/17 10:38, Dario Faggioli wrote:
> During load balancing, we check the non idle pCPUs to
> see if they have runnable but not running vCPUs that
> can be stolen by and set to run on currently idle pCPUs.
>
> If a pCPU has only one running (or runnable) vCPU,
> though, we don't want to steal it from there, and
> it's therefore pointless bothering with it
> (especially considering that bothering means trying
> to take its runqueue lock!).
>
> On large systems, when load is only slightly higher
> than the number of pCPUs (i.e., there are just a few
> more active vCPUs than the number of the pCPUs), this
> may mean that:
>  - we go through all the pCPUs,
>  - for each one, we (try to) take its runqueue locks,
>  - we figure out there's actually nothing to be stolen!
>
> To mitigate this, we introduce here the concept of
> overloaded runqueues, and a cpumask where to record
> what pCPUs are in such state.
>
> An overloaded runqueue has at least runnable 2 vCPUs
> (plus the idle one, which is always there). Typically,
> this means 1 vCPU is running, and 1 is sitting in  the
> runqueue, and can hence be stolen.
>
> Then, in  csched_balance_load(), it is enough to go
> over the overloaded pCPUs, instead than all non-idle
> pCPUs, which is better.
>
> signed-off-by: Dario Faggioli <dario.faggi...@citrix.com>
> ---
> Cc: George Dunlap <george.dun...@eu.citrix.com>
> Cc: Andrew Cooper <andrew.coop...@citrix.com>

Malcolm’s solution to this problem is
https://github.com/xenserver/xen-4.7.pg/commit/0f830b9f229fa6472accc9630ad16cfa42258966
 
This has been in 2 releases of XenServer now, and has a very visible
improvement for aggregate multi-queue multi-vm intrahost network
performance (although I can't find the numbers right now).

The root of the performance problems is that pcpu_schedule_trylock() is
expensive even for the local case, while cross-cpu locking is much
worse.  Locking every single pcpu in turn is terribly expensive, in
terms of hot cacheline pingpong, and the lock is frequently contended.

As a first opinion of this patch, you are adding another cpumask which
is going to play hot cacheline pingpong.

~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Reply via email to