This email sketches an 'unwinder cache' interface for libdwfl, derived from recent eu-stacktrace code [1] and based on Christian Hergert's adaptation of eu-stacktrace as sysprof-live-unwinder [2]. The intent is to remove the need for a lot of boilerplate code that would be identical among profiling tools that use libdwfl to unwind perf_events stack samples. But since this becomes a library design, it's in need of feedback and bikeshedding.
[1]: https://sourceware.org/cgit/elfutils/tree/README.eu-stacktrace?h=users/serhei/eu-stacktrace [2]: https://gitlab.gnome.org/GNOME/sysprof/-/merge_requests/110 Here's my initial take on what to implement, keeping things as simple as possible. I'll follow up with more emails if or when my understanding of the best design evolves during further prototyping. (1) Suggested interface for a new struct Dwfl_Process_Table: Dwfl_Process_Table *dwfl_process_table_begin (const Dwfl_Callbacks *callbacks); Dwfl *dwfl_process_table_find_pid (Dwfl_Process_Table *dwfltab, pid_t pid, bool create_if_missing); Dwfl, although documented in the header as a "session with the library", actually supports reporting/attaching one process at a time. Its real meaning is therefore "elfutils data for working with a particular Dwfl_Process". So, for global profiling, eu-stacktrace or the profiler end up creating a table pid->Dwfl, with one Dwfl structure per process. I'd like elfutils to provide this table, and in the implementation we could work on removing redundant work between processes. Looking at Dwfl_Module, the relocations are obviously specific to each process, but a lot of the Elf and Dwarf data is potentially shareable between Dwfl instances. (2) Suggested interface for existing struct Dwfl: dwfl_perf_sample_getframes (dwfl, tid, stack, stack_size, regs, n_regs, abi, perf_regs_mask, int (*callback) (Dwfl_Frame *state, void *arg), void *arg); /* could also implement dwfl_thread_perf_sample_getframes(thread, ... args as above ...) */ /* could package the args from stack .. perf_regs_mask into a struct Dwfl_Stack_Sample? */ This implements the unwinding work that eu-stacktrace currently does on behalf of the profiler. (3) Another thing the profiler needs to do is react to perf mmap events by extending the Dwfl's representation of the process. Currently this is implementable by calling dwfl_report_begin_add and imitating what dwfl_linux_proc_maps_report does. The code to do this might be worth rolling into libdwfl as a call along the lines of dwfl_report_proc_map (dwfl, filename, start, end, inode); though the exact details have to be worked out. I'm sure there's some pitfalls here I haven't discovered yet. === * Why not the existing eu-stacktrace fifo interface? In practice, the fifo solution proves to be unsatisfying in terms of perceived performance -- extra hoops through which the data must jump between processes -- and maintainability -- the Sysprof binary format needs an extension for representing stack data, with nontrivial details to bikeshed there. The bikeshedding effort may as well be spent on designing a library interface to the eu-stacktrace functionality. * Why do we need a Dwfl_Process_Table struct independent of Dwfl? As mentioned above, putting it in the library gives us a place to implement caching of data/work that's redundant between different Dwfls. There's a different way of designing this that would involve having Dwfl be the global session struct and replacing the single Dwfl::process field with a table of processes, but that requires redoing the existing library interface to be parametrized by Dwfl_Process instead of just Dwfl, grossly disruptive relative to the level of functionality I'm actually adding. We could also have Dwfls share data through a hidden global variable, but it seems gross to enable such sneaky caching without the user of the library explicitly requesting it via Dwfl_Process_Table. All the best, Serhei