On Tue, 19 Mar 2024 at 10:08, Tobias Huschle <husc...@linux.ibm.com> wrote: > > On 2024-03-18 15:45, Luis Machado wrote: > > On 3/14/24 13:45, Tobias Huschle wrote: > >> On Fri, Mar 08, 2024 at 03:11:38PM +0000, Luis Machado wrote: > >>> On 2/28/24 16:10, Tobias Huschle wrote: > >>>> > >>>> Questions: > >>>> 1. The kworker getting its negative lag occurs in the following > >>>> scenario > >>>> - kworker and a cgroup are supposed to execute on the same CPU > >>>> - one task within the cgroup is executing and wakes up the > >>>> kworker > >>>> - kworker with 0 lag, gets picked immediately and finishes its > >>>> execution within ~5000ns > >>>> - on dequeue, kworker gets assigned a negative lag > >>>> Is this expected behavior? With this short execution time, I > >>>> would > >>>> expect the kworker to be fine. > >>> > >>> That strikes me as a bit odd as well. Have you been able to determine > >>> how a negative lag > >>> is assigned to the kworker after such a short runtime? > >>> > >> > >> I did some more trace reading though and found something. > >> > >> What I observed if everything runs regularly: > >> - vhost and kworker run alternating on the same CPU > >> - if the kworker is done, it leaves the runqueue > >> - vhost wakes up the kworker if it needs it > >> --> this means: > >> - vhost starts alone on an otherwise empty runqueue > >> - it seems like it never gets dequeued > >> (unless another unrelated task joins or migration hits) > >> - if vhost wakes up the kworker, the kworker gets selected > >> - vhost runtime > kworker runtime > >> --> kworker gets positive lag and gets selected immediately next > >> time > >> > >> What happens if it does go wrong: > >> From what I gather, there seem to be occasions where the vhost either > >> executes suprisingly quick, or the kworker surprinsingly slow. If > >> these > >> outliers reach critical values, it can happen, that > >> vhost runtime < kworker runtime > >> which now causes the kworker to get the negative lag. > >> > >> In this case it seems like, that the vhost is very fast in waking up > >> the kworker. And coincidentally, the kworker takes, more time than > >> usual > >> to finish. We speak of 4-digit to low 5-digit nanoseconds. > >> > >> So, for these outliers, the scheduler extrapolates that the kworker > >> out-consumes the vhost and should be slowed down, although in the > >> majority > >> of other cases this does not happen. > > > > Thanks for providing the above details Tobias. It does seem like EEVDF > > is strict > > about the eligibility checks and making tasks wait when their lags are > > negative, even > > if just a little bit as in the case of the kworker. > > > > There was a patch to disable the eligibility checks > > (https://lore.kernel.org/lkml/20231013030213.2472697-1-youssefes...@chromium.org/), > > which would make EEVDF more like EVDF, though the deadline comparison > > would > > probably still favor the vhost task instead of the kworker with the > > negative lag. > > > > I'm not sure if you tried it, but I thought I'd mention it. > > Haven't seen that one yet. Unfortunately, it does not help to ignore the > eligibility. > > I'm inclined to rather propose propose a documentation change, which > describes that tasks should not rely on woken up tasks being scheduled > immediately.
Where do you see such an assumption ? Even before eevdf, there were nothing that ensures such behavior. When using CFS (legacy or eevdf) tasks, you can't know if the newly wakeup task will run 1st or not > > Changing things in the code to address for the specific scenario I'm > seeing seems to mostly create unwanted side effects and/or would require > the definition of some magic cut-off values. > >