Since last year's OSPM Summit we started conceiving the idea that task wakeup path could be better tuned for certain classes of workloads and usage scenarios. Various people showed interest for a possible tuning interface for the scheduler wakeup path.
.:: The Problem =============== The discussions we had so far [1] have not been effective in clearly identifying if a common tuning surface is possible. The last discussion at this year's OSPM Summit [2,3] was also kind of inconclusive and left us with the message: start by collecting the requirements and then see what interface fits them the best. General consensus is that a unified interface can be challenging and maybe not feasible. However, generalisation is also a value and we should strive for it whenever it's possible. Someone might think that we did not put enough effort in the analysis of requirements. Perhaps what we missed so far is also a structured and organised way to collect requirements which also can help in factoring out the things they have in common. .:: The Proposal ================ This thread aims at providing a guided template for the description of different task wakeup use-cases. It does that by setting a series of questions aimed at precisely describing what's "currently broken", what we would like to have instead and how we could achieve it. What we propose here is that, for each wakeup use-case, we start by replying to this email to provide the required details/comments for a predefined list of questions. This will generate independent discussion threads. Each thread will be free to focus on a specific proposal but still all the thread will be reasoning around a common set of fundamental concepts. The hope is that, by representing all the use-cases as sufficiently detailed responses to a common set of questions, once the discussion settles down, we can more easily verify if there are common things surfacing which then can be naturally wrapped into a unified user-space API. A first use-case description, following the template guidelines, will be posted shortly after this message. This also will provide an example for how to use the template. NOTE: Whenever required, pseudo-code or simplified C can be used. I hope this can drive a fruitful discussion in preparation for LPC! Best, Patrick ---8<--- For templates submissions: reply only to the following ---8<--- .:: Scheduler Wakeup Path Requirements Collection Template ========================================================== A) Name: unique one-liner name for the proposed use-case B) Target behaviour: one paragraph to describe the wakeup path issue C) Existing control paths: reference to code paths to justify B) Assuming v5.6 as the reference kernel, this section should provide links to code paths such as, e.g. fair.c:3917 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/fair.c?h=v5.6#n3917 Alternatively code snippets can be added inline, e.g. /* * The 'current' period is already promised to the current tasks, * however the extra weight of the new task will slow them down a * little, place the new task so that it fits in the slot that * stays open at the end. */ if (initial && sched_feat(START_DEBIT)) vruntime += sched_vslice(cfs_rq, se); NOTE: if the use-case exists only outside the mainline Linux kernel this section can stay empty D) Desired behaviour: one paragraph to describe the desired update NOTE: the current mainline expression is assumed to be correct for existing use-cases. Thus, here we are looking for run-time tuning of those existing features. E) Existing knobs (if any): reference to whatever existing tunable Some features can already be tuned, but perhaps only via compile time knobs, SCHED_FEATs or system wide tunable. If that's the case, we should document them here and explain how they currently work and what are (if any) the implicit assumptions, e.g. what is the currently implemented scheduling policy/heuristic. F) Existing knobs (if any): one paragraph description of the limitations If the existing knobs are not up to the job for this use-case, shortly explain here why. It could be because a tuning surface is already there but it's hardcoded (e.g. compile time knob) or too coarse grained (e.g. a SCHED_FEAT). G) Proportionality Analysis: check the nature of the target behavior Goal here is to verify and discuss if the behaviour (B) has a proportional nature: different values of the control knobs (E) are expected to produce different effects for (B). Special care should be taken to check if the target behaviour has an intrinsically "binary nature", i.e. only two values make really sense. In this case it would be very useful to argument why a generalisation towards a non-binary behaviours does NOT make sense. H) Range Analysis: identify meaningful ranges If (G) was successfully, i.e. there is a proportional correlation between (E) and (B), discuss here about a meaningful range for (E) and (F). I) System-Wide tuning: which knobs are required If required, list new additional tunables here, how they should be exposed and (if required) which scheduling classes will be affected. J) Per-Task tuning: which knobs are required Describe which knobs should be added and which task specific API (e.g. sched_setscheduler(), prctl(), ...) they should be used. K) Task-Group tuning: which knobs are required If the use-case can benefit from a task-group tuning, here it should **briefly described** how the expected behaviour can be mapped on a cgroup v2 unified hierarchy. NOTE: implementation details are not required but we should be able to hint at which cgroup v2 resource distribution model [5] should be applied. ---8<--- For templates submissions: exclude the following ---8<--- .:: References ============== [1] [Discussion v2] Usecases for the per-task latency-nice attribute Message-ID: 2bd46086-43ff-f130-8720-8eec694eb...@linux.ibm.com https://lore.kernel.org/lkml/2bd46086-43ff-f130-8720-8eec694eb...@linux.ibm.com [2] Latency Nice: Implementation and UseCases for Scheduler Optmizations https://ospm.lwn.net/playback/presentation/2.0/playback.html?meetingId=380dd88f044f67ee4c94d0a2a4fb7c3f46cb6391-1589459486615&t=42m37s [3] LWN: The many faces of "latency nice" https://lwn.net/Articles/820659 [4] [PATCH v5 0/4] Introduce per-task latency_nice for scheduler hints Message-ID: 20200228090755.22829-1-pa...@linux.ibm.com https://lore.kernel.org/lkml/20200228090755.22829-1-pa...@linux.ibm.com [5] Control Group v2: Resource Distribution Models https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v2.rst