On Tue, Aug 26, 2025 at 8:32 AM Robin Murphy <robin.mur...@arm.com> wrote: > > On 2025-08-26 2:03 pm, Peter Zijlstra wrote: > > On Wed, Aug 13, 2025 at 06:01:04PM +0100, Robin Murphy wrote: > >> It may have been different long ago, but today it seems wrong for these > >> drivers to skip counting disabled sibling events in group validation, > >> given that perf_event_enable() could make them schedulable again, and > >> thus increase the effective size of the group later. Conversely, if a > >> sibling event is truly dead then it stands to reason that the whole > >> group is dead, so it's not worth going to any special effort to try to > >> squeeze in a new event that's never going to run anyway. Thus, we can > >> simply remove all these checks. > > > > So currently you can do sort of a manual event rotation inside an > > over-sized group and have it work. > > > > I'm not sure if anybody actually does this, but its possible. > > > > Eg. on a PMU that supports only 4 counters, create a group of 5 and > > periodically cycle which of the 5 events is off.
I'm not sure this is true, I thought this would fail in the perf_event_open when adding the 5th event and there being insufficient counters for the group. Not all PMUs validate a group will fit on the counters, but I thought at least Intel's core PMU would validate and not allow this. Fwiw, the metric code is reliant on this behavior as by default all events are placed into a weak group: https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/metricgroup.c?h=perf-tools-next#n631 Weak groups are really just groups that when the perf_event_open fails retry with the grouping removed. PMUs that don't fail the perf_event_open are problematic as the reads just report "not counted" and the metric doesn't work. Sometimes the PMU can't help it due to errata. There are a bunch of workarounds for those cases carried in the perf tool, but in general weak groups working is relied upon: https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/pmu-events.h?h=perf-tools-next#n16 Thanks, Ian > > So I'm not against changing this, but changing stuff like this always > > makes me a little fearful -- it wouldn't be the first time that when it > > finally trickles down to some 'enterprise' user in 5 years someone comes > > and finally says, oh hey, you broke my shit :-( > > Eww, I see what you mean... and I guess that's probably lower-overhead > than actually deleting and recreating the sibling event(s) each time, > and potentially less bother then wrangling multiple groups for different > combinations of subsets when one simply must still approximate a complex > metric that requires more counters than the hardware offers. > > I'm also not keen to break anything that wasn't already somewhat broken, > especially since this patch is only intended as cleanup, so either we > could just drop it altogether, or perhaps I can wrap the existing > behaviour in a helper that can at least document this assumption and > discourage new drivers from copying it. Am I right that only > PERF_EVENT_STATE_{OFF,ERROR} would matter for this, though, and my > reasoning for state <= PERF_EVENT_STATE_EXIT should still stand? As for > the fiddly discrepancy with enable_on_exec between arm_pmu and others > I'm not really sure what to think... > > Thanks, > Robin.