On Sun, Jan 1, 2017 at 12:20 PM, David Carrillo-Cisneros <davi...@google.com> wrote: > From: Mark Rutland <mark.rutl...@arm.com> > > On Thu, Nov 10, 2016 at 05:26:32PM +0100, Peter Zijlstra wrote: >> On Thu, Nov 10, 2016 at 02:10:37PM +0000, Mark Rutland wrote: >> >> > Sure, that sounds fine for scheduling (including big.LITTLE). >> > >> > I might still be misunderstanding something, but I don't think that >> > helps Kan's case: since INACTIVE events which will fail their filters >> > (including the CPU check) will still be in the tree, they will still >> > have to be iterated over. >> > >> > That is, unless we also sort the tree by event->cpu, or if in those >> > cases we only care about ACTIVE events and can use an active list. >> >> A few emails back up I wrote: >> >> >> If we stick all events in an RB-tree sorted on: {pmu,cpu,runtime} we > > Ah, sorry. Clearly I wouldn't pass a reading comprehension test today. > >> Looking at the code there's also cgroup muck, not entirely sure where in >> the sort order that should go if at all. >> >> But having pmu and cpu in there would cure the big-little and >> per-task-per-cpu event issues. > > Yup, that all makes sense to me now (modulo the cgroup stuff I also > haven't considered yet).
cgroup events are stored in each pmu's cpuctx, so they wouldn't benefit from a pmu,cpu sort order. Yet the RB-tree would help if it could use cgroup as key for cpu contexts. Is there a reason to have runtime as part of the RB-tree? Couldn't a FIFO list work just fine? A node could have an ACTIVE and an INACTIVE FIFO list and just move the events in out the tree in ioctl and to/from ACTIVE from/to INACTIVE on sched in/out. This would speed up both sched in and sched out. The node would be something like this: struct ctx_rbnode { struct rb_node node; struct list_head active_events; struct list_head inactive_events; }; And the insertion order would be {pmu, cpu} for task contexts (cpu == -1 for events without fixed cpu) and {cgroup} for cpuctxs (CPU events would have NULL cgrp). Am I interested on getting this to work as part of the cgroup context switch optimization that CQM/CMT needs. See discussion in: https://patchwork.kernel.org/patch/9478617/ Is anyone actively working on it? Thanks, David