> From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] > Sent: Monday, 27 February 2023 21.53 > > > >> Add support for programming PMU counters and reading their values in > > >> runtime bypassing kernel completely. > > >> > > >> This is especially useful in cases where CPU cores are isolated i.e > > >> run dedicated tasks. In such cases one cannot use standard perf > > >> utility without sacrificing latency and performance. > > >> > > >> Signed-off-by: Tomasz Duszynski <tduszyn...@marvell.com> > > >> Acked-by: Morten Brørup <m...@smartsharesystems.com> > > >
[...] > > >> +int > > >> +__rte_pmu_enable_group(void) > > >> +{ > > >> + struct rte_pmu_event_group *group = > > >> &RTE_PER_LCORE(_event_group); > > >> + int ret; > > >> + > > >> + if (rte_pmu.num_group_events == 0) > > >> + return -ENODEV; > > >> + > > >> + ret = open_events(group); > > >> + if (ret) > > >> + goto out; > > >> + > > >> + ret = mmap_events(group); > > >> + if (ret) > > >> + goto out; > > >> + > > >> + if (ioctl(group->fds[0], PERF_EVENT_IOC_RESET, > > >> PERF_IOC_FLAG_GROUP) == - > 1) { > > >> + ret = -errno; > > >> + goto out; > > >> + } > > >> + > > >> + if (ioctl(group->fds[0], PERF_EVENT_IOC_ENABLE, > > >> PERF_IOC_FLAG_GROUP) == > -1) { > > >> + ret = -errno; > > >> + goto out; > > >> + } > > >> + > > >> + rte_spinlock_lock(&rte_pmu.lock); > > >> + TAILQ_INSERT_TAIL(&rte_pmu.event_group_list, group, next); > > > > > >Hmm.. so we insert pointer to TLS variable into the global list? > > >Wonder what would happen if that thread get terminated? > > > > Nothing special. Any pointers to that thread-local in that thread are > invalided. > > > > >Can memory from its TLS block get re-used (by other thread or for other > purposes)? > > > > > > > Why would any other thread reuse that? > > Eventually main thread will need that data to do the cleanup. > > I understand that main thread would need to access that data. > I am not sure that it would be able to. > Imagine thread calls rte_pmu_read(...) and then terminates, while program > continues to run. Is the example you describe here (i.e. a thread terminating in the middle of doing something) really a scenario DPDK is supposed to support? > As I understand address of its RTE_PER_LCORE(_event_group) will still remain > in rte_pmu.event_group_list, > even if it is probably not valid any more. There should be a "destructor/done/finish" function available to remove this from the list. [...] > > >Even if we'd decide to keep rte_pmu_read() as static inline (still not > > >sure it is a good idea), > > > > We want to save as much cpu cycles as we possibly can and inlining does > helps > > in that matter. > > Ok, so asking same question from different thread: how many cycles it will > save? > What is the difference in terms of performance when you have this function > inlined vs not inlined? We expect to use this in our in-house profiler library. For this reason, I have a very strong preference for absolute maximum performance. Reading PMU events is for performance profiling, so I expect other potential users of the PMU library to share my opinion on this.