This email sketches an 'unwinder cache' interface for libdwfl, derived from 
recent eu-stacktrace code [1] and based on Christian Hergert's adaptation of 
eu-stacktrace as sysprof-live-unwinder [2]. The intent is to remove the need 
for a lot of boilerplate code that would be identical among profiling tools 
that use libdwfl to unwind perf_events stack samples. But since this becomes a 
library design, it's in need of feedback and bikeshedding.

[1]: 
https://sourceware.org/cgit/elfutils/tree/README.eu-stacktrace?h=users/serhei/eu-stacktrace
[2]: https://gitlab.gnome.org/GNOME/sysprof/-/merge_requests/110

Here's my initial take on what to implement, keeping things as simple as 
possible. I'll follow up with more emails if or when my understanding of the 
best design evolves during further prototyping.

(1) Suggested interface for a new struct Dwfl_Process_Table:

Dwfl_Process_Table *dwfl_process_table_begin (const Dwfl_Callbacks *callbacks);
Dwfl *dwfl_process_table_find_pid (Dwfl_Process_Table *dwfltab, pid_t pid,
                                                             bool 
create_if_missing);

Dwfl, although documented in the header as a "session with the library", 
actually supports reporting/attaching one process at a time. Its real meaning 
is therefore "elfutils data for working with a particular Dwfl_Process".

So, for global profiling, eu-stacktrace or the profiler end up creating a table 
pid->Dwfl, with one Dwfl structure per process. I'd like elfutils to provide 
this table, and in the implementation we could work on removing redundant work 
between processes. Looking at Dwfl_Module, the relocations are obviously 
specific to each process, but a lot of the Elf and Dwarf data is potentially 
shareable between Dwfl instances.

(2) Suggested interface for existing struct Dwfl:

dwfl_perf_sample_getframes (dwfl, tid,
                                                   stack, stack_size,
                                                   regs, n_regs, abi,
                                                   perf_regs_mask,
                                                   int (*callback) (Dwfl_Frame 
*state, void *arg),
                                                   void *arg);
/* could also implement dwfl_thread_perf_sample_getframes(thread, ... args as 
above ...) */
/* could package the args from stack .. perf_regs_mask into a struct 
Dwfl_Stack_Sample? */

This implements the unwinding work that eu-stacktrace currently does on behalf 
of the profiler.

(3) Another thing the profiler needs to do is react to perf mmap events by 
extending the Dwfl's representation of the process. Currently this is 
implementable by calling dwfl_report_begin_add and imitating what 
dwfl_linux_proc_maps_report does. The code to do this might be worth rolling 
into libdwfl as a call along the lines of

dwfl_report_proc_map (dwfl, filename, start, end, inode);

though the exact details have to be worked out. I'm sure there's some pitfalls 
here I haven't discovered yet.

===

* Why not the existing eu-stacktrace fifo interface?

In practice, the fifo solution proves to be unsatisfying in terms of perceived 
performance -- extra hoops through which the data must jump between processes 
-- and maintainability -- the Sysprof binary format needs an extension for 
representing stack data, with nontrivial details to bikeshed there. The 
bikeshedding effort may as well be spent on designing a library interface to 
the eu-stacktrace functionality.

* Why do we need a Dwfl_Process_Table struct independent of Dwfl?

As mentioned above, putting it in the library gives us a place to implement 
caching of data/work that's redundant between different Dwfls. There's a 
different way of designing this that would involve having Dwfl be the global 
session struct and replacing the single Dwfl::process field with a table of 
processes, but that requires redoing the existing library interface to be 
parametrized by Dwfl_Process instead of just Dwfl, grossly disruptive relative 
to the level of functionality I'm actually adding. We could also have Dwfls 
share data through a hidden global variable, but it seems gross to enable such 
sneaky caching without the user of the library explicitly requesting it via 
Dwfl_Process_Table.

All the best,
    Serhei

Reply via email to