On 5/14/19 10:27 AM, Subhra Mazumdar wrote:
On 5/14/19 9:03 AM, Steven Sistare wrote:
On 5/13/2019 7:35 AM, Peter Zijlstra wrote:
On Mon, May 13, 2019 at 03:04:18PM +0530, Viresh Kumar wrote:
On 10-05-19, 09:21, Peter Zijlstra wrote:
I don't hate his per se; but the whole select_idle_sibling() thing is
something that needs looking at.
There was the task stealing thing from Steve that looked
interesting and
that would render your apporach unfeasible.
I am surely missing something as I don't see how that patchset will
make this patchset perform badly, than what it already does.
Nah; I just misremembered. I know Oracle has a patch set poking at
select_idle_siblings() _somewhere_ (as do I), and I just found the
wrong
one.
Basically everybody is complaining select_idle_sibling() is too
expensive for checking the entire LLC domain, except for FB (and thus
likely some other workloads too) that depend on it to kill their tail
latency.
But I suppose we could still do this, even if we scan only a subset of
the LLC, just keep track of the last !idle CPU running only SCHED_IDLE
tasks and pick that if you do not (in your limited scan) find a better
candidate.
Subhra posted a patch that incrementally searches for an idle CPU in
the LLC,
remembering the last CPU examined, and searching a fixed number of
CPUs from there.
That technique is compatible with the one that Viresh suggests; the
incremental
search would stop if a SCHED_IDLE cpu was found.
This was the last version of patchset I sent:
https://lkml.org/lkml/2018/6/28/810
Also select_idle_core is a net -ve for certain workloads like OLTP. So I
had put a SCHED_FEAT to be able to disable it.
Forgot to add, the cpumask_weight computation may not be O(1) with large
number of CPUs, so needs to be precomputed in a per-cpu variable to further
optimize. That part is missing from the above patchset.
Thanks,
Subhra
I also fiddled with select_idle_sibling, maintaining a per-LLC bitmap
of idle CPUs,
updated with atomic operations. Performance was basically unchanged
for the
workloads I tested, and I inserted timers around the idle search
showing it was
a very small fraction of time both before and after my changes. That
led me to
ignore the push side and optimize the pull side with task stealing.
I would be very interested in hearing from folks that have workloads
that demonstrate
that select_idle_sibling is too expensive.
- Steve