pá 17. 1. 2025 v 1:46 odesílatel Steven Rostedt <rost...@goodmis.org> napsal: > Hmm, I wonder if timerlat can handle per cpu data, then you could kick off > a thread per CPU (or a set of CPUs) where the thread is responsible for > handling the data. > > > CPU_ZERO_S(cpu_size, cpusetp); > CPU_SET_S(cpu, cpu_size, cpusetp); > retval = tracefs_iterate_raw_events(trace->tep, > trace->inst, > cpusetp, > cpu_size, > collect_registered_events, > trace); > > And then that iteration will only read over a subset of CPUs. Each thread > can do a different subset and then it should be able to keep up. >
That's a good idea, I didn't think of that. But it doesn't help much in a scenario where rtla is pinned to a few housekeeping CPUs with -H, which is used for testing isolated-CPU-based setups. I was thinking of turning timerlat_hist_handler/timerlat_top_handler into a BPF program and having it executed right after the sample is created, e.g. by using the BPF perf interface to hook it to a tracepoint event. The histogram/counter would be stored in BPF maps, which would be merely copied over in the main loop. This is essentially how cyclictest does it, except in userspace. I expect this solution to have good performance, but the obvious downside is that it requires BPF. This is not a problem for us, but might be for other rtla users and we'd likely have to keep both implementations of sample processing in the code. Also, before even starting with that, it would be likely necessary to remove the duplicate code throughout timerlat/osnoise and test it properly, so we don't have to do the same code changes twice or four times. Tomas