[PATCH] perf: fix pmu::filter_match for SW-led groups

Mark Rutland Tue, 14 Jun 2016 08:11:38 -0700

Commit 66eb579e66ecfea5 ("perf: allow for PMU-specific event filtering")
added pmu::filter_match. This was intended to avoid HW constraints on
events from resulting in extremely pessimistic scheduling.


However, pmu::filter_match is only called for the leader of each event
group. When the leader is a SW event, we do not filter the groups, and
may fail at pmu::add time, and when this happens we'll give up on
scheduling any event groups later in the list until they are rotated
ahead of the failing group.

This can result in extremely sub-optimal scheduling behaviour, e.g. if
running the following on a big.LITTLE platform:

$ taskset -c 0 ./perf stat \
 -e 'a57{context-switches,armv8_cortex_a57/config=0x11/}' \
 -e 'a53{context-switches,armv8_cortex_a53/config=0x11/}' \
 ls

     <not counted>      context-switches                                        
      (0.00%)
     <not counted>      armv8_cortex_a57/config=0x11/                           
          (0.00%)
                24      context-switches                                        
      (37.36%)
          57589154      armv8_cortex_a53/config=0x11/                           
          (37.36%)

Here the 'a53' event group was always eligible to be scheduled, but the
a57 group never eligible to be scheduled, as the task was always affine
to a Cortex-A53 CPU. The SW (group leader) event in the 'a57' group was
eligible, but the HW event failed at pmu::add time, resulting in
ctx_flexible_sched_in giving up on scheduling further groups with HW
events.

One way of avoiding this is to check pmu::filter_match on siblings as
well as the group leader. If any of these fail their pmu::filter_match,
we must skip the entire group before attempting to add any events.

Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Signed-off-by: Mark Rutland <[email protected]>
---
 kernel/events/core.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

I've tried to find a better way of handling this (without needing to walk the
siblings list), but so far I'm at a loss. At least it's "only" O(n) in the size
of the sibling list we were going to walk anyway.

I suspect that at a more fundamental level, I need to stop sharing a
perf_hw_context between HW PMUs (i.e. replace task_struct::perf_event_ctxp with
something that can handle multiple HW PMUs). From previous attempts I'm not
sure if that's going to be possible.

Any ideas appreciated!

Mark.

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9c51ec3..c0b6db0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1678,12 +1678,32 @@ static bool is_orphaned_event(struct perf_event *event)
        return event->state == PERF_EVENT_STATE_DEAD;
 }
 
-static inline int pmu_filter_match(struct perf_event *event)
+static inline int __pmu_filter_match(struct perf_event *event)
 {
        struct pmu *pmu = event->pmu;
        return pmu->filter_match ? pmu->filter_match(event) : 1;
 }
 
+/*
+ * Check whether we should attempt to schedule an event group based on
+ * PMU-specific filtering. An event group can consist of HW and SW events,
+ * potentially with a SW leader, so we must check all the filters to determine
+ * whether a group is schedulable.
+ */
+static inline int pmu_filter_match(struct perf_event *event)
+{
+       struct perf_event *child;
+
+       if (!__pmu_filter_match(event))
+               return 0;
+
+       list_for_each_entry(child, &event->sibling_list, group_entry)
+               if (!__pmu_filter_match(child))
+                       return 0;
+
+       return 1;
+}
+
 static inline int
 event_filter_match(struct perf_event *event)
 {
-- 
1.9.1

[PATCH] perf: fix pmu::filter_match for SW-led groups

Reply via email to