On 12/10/24 16:42, Serhei Makarov wrote: > This email sketches an 'unwinder cache' interface for libdwfl, derived from > recent eu-stacktrace code [1] and based on Christian Hergert's adaptation of > eu-stacktrace as sysprof-live-unwinder [2]. The intent is to remove the need > for a lot of boilerplate code that would be identical among profiling tools > that use libdwfl to unwind perf_events stack samples. But since this becomes > a library design, it's in need of feedback and bikeshedding. > > [1]: > https://sourceware.org/cgit/elfutils/tree/README.eu-stacktrace?h=users/serhei/eu-stacktrace > [2]: https://gitlab.gnome.org/GNOME/sysprof/-/merge_requests/110 > > Here's my initial take on what to implement, keeping things as simple as > possible. I'll follow up with more emails if or when my understanding of the > best design evolves during further prototyping. > > (1) Suggested interface for a new struct Dwfl_Process_Table: > > Dwfl_Process_Table *dwfl_process_table_begin (const Dwfl_Callbacks > *callbacks); > Dwfl *dwfl_process_table_find_pid (Dwfl_Process_Table *dwfltab, pid_t pid, > bool > create_if_missing); > > Dwfl, although documented in the header as a "session with the library", > actually supports reporting/attaching one process at a time. Its real meaning > is therefore "elfutils data for working with a particular Dwfl_Process". > > So, for global profiling, eu-stacktrace or the profiler end up creating a > table pid->Dwfl, with one Dwfl structure per process. I'd like elfutils to > provide this table, and in the implementation we could work on removing > redundant work between processes. Looking at Dwfl_Module, the relocations are > obviously specific to each process, but a lot of the Elf and Dwarf data is > potentially shareable between Dwfl instances. > > (2) Suggested interface for existing struct Dwfl: > > dwfl_perf_sample_getframes (dwfl, tid, > stack, stack_size, > regs, n_regs, abi, > perf_regs_mask, > int (*callback) > (Dwfl_Frame *state, void *arg), > void *arg); > /* could also implement dwfl_thread_perf_sample_getframes(thread, ... args as > above ...) */ > /* could package the args from stack .. perf_regs_mask into a struct > Dwfl_Stack_Sample? */ > > This implements the unwinding work that eu-stacktrace currently does on > behalf of the profiler. > > (3) Another thing the profiler needs to do is react to perf mmap events by > extending the Dwfl's representation of the process. Currently this is > implementable by calling dwfl_report_begin_add and imitating what > dwfl_linux_proc_maps_report does. The code to do this might be worth rolling > into libdwfl as a call along the lines of > > dwfl_report_proc_map (dwfl, filename, start, end, inode);
Hi Serhei, The dwfl_report_proc_map() will handle when things are added to the memory map. Is there anything to handle the inverse event when memory is removed from the process memory map, such as a dlcose()? -Will > > though the exact details have to be worked out. I'm sure there's some > pitfalls here I haven't discovered yet. > > === > > * Why not the existing eu-stacktrace fifo interface? > > In practice, the fifo solution proves to be unsatisfying in terms of > perceived performance -- extra hoops through which the data must jump between > processes -- and maintainability -- the Sysprof binary format needs an > extension for representing stack data, with nontrivial details to bikeshed > there. The bikeshedding effort may as well be spent on designing a library > interface to the eu-stacktrace functionality. > > * Why do we need a Dwfl_Process_Table struct independent of Dwfl? > > As mentioned above, putting it in the library gives us a place to implement > caching of data/work that's redundant between different Dwfls. There's a > different way of designing this that would involve having Dwfl be the global > session struct and replacing the single Dwfl::process field with a table of > processes, but that requires redoing the existing library interface to be > parametrized by Dwfl_Process instead of just Dwfl, grossly disruptive > relative to the level of functionality I'm actually adding. We could also > have Dwfls share data through a hidden global variable, but it seems gross to > enable such sneaky caching without the user of the library explicitly > requesting it via Dwfl_Process_Table. > > All the best, > Serhei >