> On May 28, 2021, at 4:12 PM, Dario Faggioli <dfaggi...@suse.com> wrote:
> 
> A !runnable unit (temporarily) present in the runq may cause us to
> stop scanning the runq itself too early. Of course, we don't run any
> non-runnable vCPUs, but we end the scan and we fallback to picking
> the idle unit. In other word, this prevent us to find there and pick
> the actual unit that we're meant to start running (which might be
> further ahead in the runq).
> 
> Depending on the vCPU pinning configuration, this may lead to such
> unit to be stuck in the runq for long time, causing malfunctioning
> inside the guest.
> 
> Fix this by checking runnable/non-runnable status up-front, in the runq
> scanning function.
> 
> Reported-by: Michał Leszczyński <michal.leszczyn...@cert.pl>
> Reported-by: Dion Kant <g.w.k...@hunenet.nl>
> Signed-off-by: Dario Faggioli <dfaggi...@suse.com>

Thanks for tracking this down, Dario!

Reviewed-by: George Dunlap <george.dun...@citrix.com>

Just one comment:
> @@ -3496,8 +3500,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>          * some budget, then choose it.
>          */
>         if ( (yield || svc->credit > snext->credit) &&
> -             (!has_cap(svc) || unit_grab_budget(svc)) &&
> -             unit_runnable_state(svc->unit) )
> +             (!has_cap(svc) || unit_grab_budget(svc)) )
>             snext = svc;

By the same logic, shouldn’t we also move the `(!has_cap() …)` clause into a 
separate `if(x) continue` clause?  There may be runnable units further down the 
queue which aren’t capped / haven’t exhausted their budget yet.

 -George

Reply via email to