> From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com]
> Sent: Monday, 27 February 2023 21.53
> 
> > >> Add support for programming PMU counters and reading their values in
> > >> runtime bypassing kernel completely.
> > >>
> > >> This is especially useful in cases where CPU cores are isolated i.e
> > >> run dedicated tasks. In such cases one cannot use standard perf
> > >> utility without sacrificing latency and performance.
> > >>
> > >> Signed-off-by: Tomasz Duszynski <tduszyn...@marvell.com>
> > >> Acked-by: Morten Brørup <m...@smartsharesystems.com>
> > >

[...]

> > >> +int
> > >> +__rte_pmu_enable_group(void)
> > >> +{
> > >> +        struct rte_pmu_event_group *group = 
> > >> &RTE_PER_LCORE(_event_group);
> > >> +        int ret;
> > >> +
> > >> +        if (rte_pmu.num_group_events == 0)
> > >> +                return -ENODEV;
> > >> +
> > >> +        ret = open_events(group);
> > >> +        if (ret)
> > >> +                goto out;
> > >> +
> > >> +        ret = mmap_events(group);
> > >> +        if (ret)
> > >> +                goto out;
> > >> +
> > >> +        if (ioctl(group->fds[0], PERF_EVENT_IOC_RESET, 
> > >> PERF_IOC_FLAG_GROUP) == -
> 1) {
> > >> +                ret = -errno;
> > >> +                goto out;
> > >> +        }
> > >> +
> > >> +        if (ioctl(group->fds[0], PERF_EVENT_IOC_ENABLE, 
> > >> PERF_IOC_FLAG_GROUP) ==
> -1) {
> > >> +                ret = -errno;
> > >> +                goto out;
> > >> +        }
> > >> +
> > >> +        rte_spinlock_lock(&rte_pmu.lock);
> > >> +        TAILQ_INSERT_TAIL(&rte_pmu.event_group_list, group, next);
> > >
> > >Hmm.. so we insert pointer to TLS variable into the global list?
> > >Wonder what would happen if that thread get terminated?
> >
> > Nothing special. Any pointers to that thread-local in that thread are
> invalided.
> >
> > >Can memory from its TLS block get re-used (by other thread or for other
> purposes)?
> > >
> >
> > Why would any other thread reuse that?
> > Eventually main thread will need that data to do the cleanup.
> 
> I understand that main thread would need to access that data.
> I am not sure that it would be able to.
> Imagine thread calls rte_pmu_read(...) and then terminates, while program
> continues to run.

Is the example you describe here (i.e. a thread terminating in the middle of 
doing something) really a scenario DPDK is supposed to support?

> As I understand address of its RTE_PER_LCORE(_event_group) will still remain
> in rte_pmu.event_group_list,
> even if it is probably not valid any more.

There should be a "destructor/done/finish" function available to remove this 
from the list.

[...]

> > >Even if we'd decide to keep rte_pmu_read() as static inline (still not
> > >sure it is a good idea),
> >
> > We want to save as much cpu cycles as we possibly can and inlining does
> helps
> > in that matter.
> 
> Ok, so asking same question from different thread: how many cycles it will
> save?
> What is the difference in terms of performance when you have this function
> inlined vs not inlined?

We expect to use this in our in-house profiler library. For this reason, I have 
a very strong preference for absolute maximum performance.

Reading PMU events is for performance profiling, so I expect other potential 
users of the PMU library to share my opinion on this.

Reply via email to