On Tue, Aug 26, 2025 at 04:31:23PM +0100, Mark Rutland wrote: > On Tue, Aug 26, 2025 at 03:35:48PM +0100, Robin Murphy wrote: > > On 2025-08-26 12:15 pm, Mark Rutland wrote: > > > On Wed, Aug 13, 2025 at 06:00:54PM +0100, Robin Murphy wrote: > > > > diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c > > > > b/drivers/perf/hisilicon/hisi_pcie_pmu.c > > > > index c5394d007b61..3b0b2f7197d0 100644 > > > > --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c > > > > +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c > > > > @@ -338,21 +338,16 @@ static bool > > > > hisi_pcie_pmu_validate_event_group(struct perf_event *event) > > > > int counters = 1; > > > > int num; > > > > - event_group[0] = leader; > > > > - if (!is_software_event(leader)) { > > > > - if (leader->pmu != event->pmu) > > > > - return false; > > > > + if (leader == event) > > > > + return true; > > > > - if (leader != event && !hisi_pcie_pmu_cmp_event(leader, > > > > event)) > > > > - event_group[counters++] = event; > > > > - } > > > > + event_group[0] = event; > > > > + if (leader->pmu == event->pmu && > > > > !hisi_pcie_pmu_cmp_event(leader, event)) > > > > + event_group[counters++] = leader; > > > > > > Looking at this, the existing logic to share counters (which > > > hisi_pcie_pmu_cmp_event() is trying to permit) looks to be bogus, given > > > that the start/stop callbacks will reprogram the HW counters (and hence > > > can fight with one another). > > > > Yeah, this had a dodgy smell when I first came across it, but after doing > > all the digging I think it does actually work out - the trick seems to be > > the group_leader check in hisi_pcie_pmu_get_event_idx(), with the > > implication the PMU is going to be stopped while scheduling in/out the whole > > group, so assuming hisi_pcie_pmu_del() doesn't clear the counter value in > > hardware (even though the first call nukes the rest of the event > > configuration), then the events should stay in sync. > > I don't think that's sufficient. If nothing else, overflow is handled > per-event, and for a group of two identical events, upon overflow > hisi_pcie_pmu_irq() will reprogram the shared HW counter when handling > the first event, and the second event will see an arbitrary > discontinuity. Maybe no-one has spotted that due to the 2^63 counter > period that we program, but this is clearly bogus. > > In addition, AFAICT the IRQ handler doesn't stop the PMU, so in general > groups aren't handled atomically, and snapshots of the counters won't be > atomic. > > > It does seem somewhat nonsensical to have multiple copies of the same event > > in the same group, but I imagine it could happen with some sort of scripted > > combination of metrics, and supporting it at this level saves needing > > explicit deduplication further up. So even though my initial instinct was to > > rip it out too, in the end I concluded that that doesn't seem justified. >
[...] > As above, I think it's clearly bogus. I don't think we should have > merged it as-is and it's not something I'd like to see others copy. > Other PMUs don't do this sort of event deduplication, and in general it > should be up to the user or userspace software to do that rather than > doing that badly in the kernel. > > Given it was implemented with no rationale I think we should rip it out. > If that breaks someone's scripting, then we can consider implementing > something that actually works. FWIW, I'm happy to go do that as a follow-up, so if that's a pain, feel free to leave that as-is for now. Mark.