On Tue, Aug 26, 2025 at 04:31:23PM +0100, Mark Rutland wrote:
> On Tue, Aug 26, 2025 at 03:35:48PM +0100, Robin Murphy wrote:
> > On 2025-08-26 12:15 pm, Mark Rutland wrote:
> > > On Wed, Aug 13, 2025 at 06:00:54PM +0100, Robin Murphy wrote:
> > > > diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c 
> > > > b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > index c5394d007b61..3b0b2f7197d0 100644
> > > > --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > @@ -338,21 +338,16 @@ static bool 
> > > > hisi_pcie_pmu_validate_event_group(struct perf_event *event)
> > > >         int counters = 1;
> > > >         int num;
> > > > -       event_group[0] = leader;
> > > > -       if (!is_software_event(leader)) {
> > > > -               if (leader->pmu != event->pmu)
> > > > -                       return false;
> > > > +       if (leader == event)
> > > > +               return true;
> > > > -               if (leader != event && !hisi_pcie_pmu_cmp_event(leader, 
> > > > event))
> > > > -                       event_group[counters++] = event;
> > > > -       }
> > > > +       event_group[0] = event;
> > > > +       if (leader->pmu == event->pmu && 
> > > > !hisi_pcie_pmu_cmp_event(leader, event))
> > > > +               event_group[counters++] = leader;
> > > 
> > > Looking at this, the existing logic to share counters (which
> > > hisi_pcie_pmu_cmp_event() is trying to permit) looks to be bogus, given
> > > that the start/stop callbacks will reprogram the HW counters (and hence
> > > can fight with one another).
> > 
> > Yeah, this had a dodgy smell when I first came across it, but after doing
> > all the digging I think it does actually work out - the trick seems to be
> > the group_leader check in hisi_pcie_pmu_get_event_idx(), with the
> > implication the PMU is going to be stopped while scheduling in/out the whole
> > group, so assuming hisi_pcie_pmu_del() doesn't clear the counter value in
> > hardware (even though the first call nukes the rest of the event
> > configuration), then the events should stay in sync.
> 
> I don't think that's sufficient. If nothing else, overflow is handled
> per-event, and for a group of two identical events, upon overflow
> hisi_pcie_pmu_irq() will reprogram the shared HW counter when handling
> the first event, and the second event will see an arbitrary
> discontinuity. Maybe no-one has spotted that due to the 2^63 counter
> period that we program, but this is clearly bogus.
> 
> In addition, AFAICT the IRQ handler doesn't stop the PMU, so in general
> groups aren't handled atomically, and snapshots of the counters won't be
> atomic.
> 
> > It does seem somewhat nonsensical to have multiple copies of the same event
> > in the same group, but I imagine it could happen with some sort of scripted
> > combination of metrics, and supporting it at this level saves needing
> > explicit deduplication further up. So even though my initial instinct was to
> > rip it out too, in the end I concluded that that doesn't seem justified.
> 

[...]

> As above, I think it's clearly bogus. I don't think we should have
> merged it as-is and it's not something I'd like to see others copy.
> Other PMUs don't do this sort of event deduplication, and in general it
> should be up to the user or userspace software to do that rather than
> doing that badly in the kernel.
> 
> Given it was implemented with no rationale I think we should rip it out.
> If that breaks someone's scripting, then we can consider implementing
> something that actually works.

FWIW, I'm happy to go do that as a follow-up, so if that's a pain, feel
free to leave that as-is for now.

Mark.

Reply via email to