On 12/10/24 16:42, Serhei Makarov wrote:
> This email sketches an 'unwinder cache' interface for libdwfl, derived from 
> recent eu-stacktrace code [1] and based on Christian Hergert's adaptation of 
> eu-stacktrace as sysprof-live-unwinder [2]. The intent is to remove the need 
> for a lot of boilerplate code that would be identical among profiling tools 
> that use libdwfl to unwind perf_events stack samples. But since this becomes 
> a library design, it's in need of feedback and bikeshedding.
> 
> [1]: 
> https://sourceware.org/cgit/elfutils/tree/README.eu-stacktrace?h=users/serhei/eu-stacktrace
> [2]: https://gitlab.gnome.org/GNOME/sysprof/-/merge_requests/110
> 
> Here's my initial take on what to implement, keeping things as simple as 
> possible. I'll follow up with more emails if or when my understanding of the 
> best design evolves during further prototyping.
> 
> (1) Suggested interface for a new struct Dwfl_Process_Table:
> 
> Dwfl_Process_Table *dwfl_process_table_begin (const Dwfl_Callbacks 
> *callbacks);
> Dwfl *dwfl_process_table_find_pid (Dwfl_Process_Table *dwfltab, pid_t pid,
>                                                              bool 
> create_if_missing);
> 
> Dwfl, although documented in the header as a "session with the library", 
> actually supports reporting/attaching one process at a time. Its real meaning 
> is therefore "elfutils data for working with a particular Dwfl_Process".
> 
> So, for global profiling, eu-stacktrace or the profiler end up creating a 
> table pid->Dwfl, with one Dwfl structure per process. I'd like elfutils to 
> provide this table, and in the implementation we could work on removing 
> redundant work between processes. Looking at Dwfl_Module, the relocations are 
> obviously specific to each process, but a lot of the Elf and Dwarf data is 
> potentially shareable between Dwfl instances.
> 
> (2) Suggested interface for existing struct Dwfl:
> 
> dwfl_perf_sample_getframes (dwfl, tid,
>                                                    stack, stack_size,
>                                                    regs, n_regs, abi,
>                                                    perf_regs_mask,
>                                                    int (*callback) 
> (Dwfl_Frame *state, void *arg),
>                                                    void *arg);
> /* could also implement dwfl_thread_perf_sample_getframes(thread, ... args as 
> above ...) */
> /* could package the args from stack .. perf_regs_mask into a struct 
> Dwfl_Stack_Sample? */
> 
> This implements the unwinding work that eu-stacktrace currently does on 
> behalf of the profiler.
> 
> (3) Another thing the profiler needs to do is react to perf mmap events by 
> extending the Dwfl's representation of the process. Currently this is 
> implementable by calling dwfl_report_begin_add and imitating what 
> dwfl_linux_proc_maps_report does. The code to do this might be worth rolling 
> into libdwfl as a call along the lines of
> 
> dwfl_report_proc_map (dwfl, filename, start, end, inode);

Hi Serhei,

The dwfl_report_proc_map() will handle when things are added to the memory map. 
 Is there anything to handle the inverse event when memory is removed from the 
process memory map, such as a dlcose()?

-Will

> 
> though the exact details have to be worked out. I'm sure there's some 
> pitfalls here I haven't discovered yet.
> 
> ===
> 
> * Why not the existing eu-stacktrace fifo interface?
> 
> In practice, the fifo solution proves to be unsatisfying in terms of 
> perceived performance -- extra hoops through which the data must jump between 
> processes -- and maintainability -- the Sysprof binary format needs an 
> extension for representing stack data, with nontrivial details to bikeshed 
> there. The bikeshedding effort may as well be spent on designing a library 
> interface to the eu-stacktrace functionality.
> 
> * Why do we need a Dwfl_Process_Table struct independent of Dwfl?
> 
> As mentioned above, putting it in the library gives us a place to implement 
> caching of data/work that's redundant between different Dwfls. There's a 
> different way of designing this that would involve having Dwfl be the global 
> session struct and replacing the single Dwfl::process field with a table of 
> processes, but that requires redoing the existing library interface to be 
> parametrized by Dwfl_Process instead of just Dwfl, grossly disruptive 
> relative to the level of functionality I'm actually adding. We could also 
> have Dwfls share data through a hidden global variable, but it seems gross to 
> enable such sneaky caching without the user of the library explicitly 
> requesting it via Dwfl_Process_Table.
> 
> All the best,
>     Serhei
> 

Reply via email to