Hello all, I wanted to open up public discussion on a project I'm looking to develop in elfutils, tentatively named eu-stacktrace. I've started to write code on branch users/serhei/eu-stacktrace.
eu-stacktrace will be a utility to process a stream of raw stack samples (such as those obtained from the Linux kernel's PERF_SAMPLE_STACK facility) into a stream of stack traces (such as the ones obtained from PERF_SAMPLE_CALLCHAIN), freeing various profiling utilities from having to implement their own backtracing logic. My initial goal is to make the tool work with a (slightly modified) version of the sysprof profiler. If all goes well, I hope to produce a demonstration of sysprof using elfutils eu-stacktrace and eh_frame data to produce useful profiles on code compiled with -fomit-frame-pointer. (I'm aware of the problem of profiling -fomit-frame-pointer programs being a topic of some fairly contentious recent discussion, which I'm not looking to rehash; I'm just interested to see if I can add a viable technical solution to the mix.) I'm cc:ing chergert and posting a link to this thread on GNOME Discourse so that sysprof developers can keep track of the discussion. For the time being, eu-stacktrace is meant to be fed data from a profiling tool via a pipe or fifo. We will see how well this idea works as implementation proceeds. The eventual goal is to work with various profiler data formats. After sysprof, supporting perf's native data format is an obvious prerequisite for merging the users/serhei/eu-stacktrace branch into elfutils. Ideally, I would like for eu-stacktrace to also convert between different profile data formats (e.g. taking sysprof data as input and emitting perf data, and vice-versa), but this may be out-of-scope given the amount of code that would need to be written to handle profile data other than stack traces. Usage instructions will be kept up-to-date in README.eu-stacktrace on the topic branch: - https://sourceware.org/cgit/elfutils/tree/README.eu-stacktrace?h=users/serhei/eu-stacktrace All the best, Serhei Makarov PS. More information follows. * * * My current roadmap for the prototype with sysprof is as follows: # 1. Get build-ids of all executables as sysprof encounters them. Build-id data can be obtained by coding sysprof to support PERF_RECORD_MMAP2 rather than PERF_RECORD_MMAP. As far as I understand, there are indications this would be a welcome patch for the sysprof project. # 2. Get stack samples with PERF_SAMPLE_STACK; pipe to eu-stacktrace. Within sysprof, add an option to switch the perf data source to use PERF_SAMPLE_STACK rather than PERF_SAMPLE_CALLCHAIN. The capture writer will write the data to a pipe to be processed by eu-stacktrace; thus the stack samples never hit the disk. Within eu-stacktrace, I'm implementing the code to accept data in sysprof format, as defined in the public header (e.g. sysprof-devel package on Fedora provides /usr/include/sysprof-4/sysprof-capture-types.h). # 3. Implement eh_frame / dwarf-via-debuginfod data retrieval in eu-stacktrace. I am hoping that eh_frame data will be sufficient, but elfutils includes support for retrieving data via debuginfod as a fallback. There are a number of use cases relating to executables inside containers that sysprof handles with clever logic. If I want to match the profile coverage of plain sysprof with sysprof+eu-stacktrace, some contemplation is required as to whether I need to duplicate that logic, or to leverage sysprof's codebase directly from eu-stacktrace. # 4. Implement and benchmark naive unwinding of all samples as they come in. Within eu-stacktrace, once we have the stack samples and the .eh_frame data accessible, use them to unwind the stack sample and output the resulting compact stack traces as callchain frames in sysprof's currently-existing format. Resulting pipeline: Of course, it is possible that eu-stacktrace is so slow that an unsuitable amount of data piles up in the pipe. This would be guaranteed if we need to retrieve data from debuginfod. # 5. If needed, scope out / implement async preparation of unwinder data. If eu-stacktrace cannot handle all of the stack samples in real time, there is a scheme that will allow us to reach good-enough profile coverage (e.g. 90%+ on a long-enough run) by caching data structures pertaining to a repeatedly-encountered code location and using a JIT-style 'priming' scheme. The overall idea: the first time we encounter a code location, we would drop the sample and initiate whatever preparation procedure (setting up data structures or retrieving data via debuginfod) is needed to unwind from that code location successfully. After the preparation procedure completes, we will be able to unwind future samples based at that code location. Within sysprof, we could add code to display a percentage indicator of how many samples in the profile were successfully converted to stack traces. This could be provided by having eu-stacktrace export the number to a procfs-style file which sysprof can poll and incorporate into its live statistic UI that already displays a running total of the number of samples. As the eu-stacktrace cache is primed with data, the success rate will rise -- in my simulated scenarios, it routinely reached 90%+ -- and the sysprof user can keep an eye on the indicator and stop profiling once the percentage has reached a satisfyingly high value. I am sure the details will be complex and interesting to work out, but I also hope this is not actually needed outside of unusual cases. # 6. Implement support for stitching stack traces to always reach the root. For a top-down profile visualization, it's not crucial to accurately unwind 100% of the samples, but it is important that the accurately-unwound samples reach the root of the stack. However, PERF_SAMPLE_STACK only provides a fixed-size sample of the stack, which may not include the root. This can be worked around with per-thread caching of the last-known state of the entire stack. Frank Ch. Eigler and I brainstormed around 5-6 possibilities for how to maintain this cache. * * * Based on the above staging, the required changes to sysprof would be reduced to the following four: 1. Collect build-id data via PERF_RECORD_MMAP2 rather than PERF_RECORD_MMAP 2. Collect stack samples via PERF_SAMPLE_STACK rather than PERF_SAMPLE_CALLCHAIN 3. Output the sample frames to a pipe connected to eu-stacktrace 4. (If needed,) poll a procfs-style file updated by eu-stacktrace to receive and display the percentage of successfully-unwound frames