Hi Walter, I've only done a brief scan of the document but, in general, I'm favorable of the goals, aim, and approach. Something I think would be good would be to compare/contrast against rr as an "exploring alternatives" section of the document. I think the document should also be made available/adapted to be part of the documentation on "why lldb is implementing this feature/what it can be used for/why".
Thanks so much for starting this and looking forward to the work and collaboration. -eric On Thu, Sep 17, 2020 at 8:28 PM Walter via lldb-dev <lldb-dev@lists.llvm.org> wrote: > Hi all, > > > > Here I propose, along with Greg Clayton, Processor Trace support for LLDB. > I’m attaching a link to the document that contains this proposal if that’s > easier to read for you: > https://docs.google.com/document/d/1cOVTGp1sL_HBXjP9eB7qjVtDNr5xnuZvUUtv43G5eVI/edit#heading=h.t5mblb9ugv8f > > <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1cOVTGp1sL-5FHBXjP9eB7qjVtDNr5xnuZvUUtv43G5eVI_edit-23heading-3Dh.t5mblb9ugv8f&d=DwMGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=erxV6KMIZvIQjyWYW8YpOiKz-WqJt4giKQA34YMHsRY&m=DuuwXHUQJpW4TcCay4hPsBund-eBI2uVaVimqEPsp5k&s=o6vqoYYbn-Tz_d34hoLJvWhEnnhracOO6yDsMzq8wR0&e=>. > Please make any comments in this mail list. > > > > If you want to quickly know what Processor Trace can do, you can read this > https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace > <https://urldefense.proofpoint.com/v2/url?u=https-3A__easyperf.net_blog_2019_08_23_Intel-2DProcessor-2DTrace&d=DwMGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=erxV6KMIZvIQjyWYW8YpOiKz-WqJt4giKQA34YMHsRY&m=DuuwXHUQJpW4TcCay4hPsBund-eBI2uVaVimqEPsp5k&s=iaErHaf8byXlZb1YFUk0BpQ-duMhNouUUMyktLm3soQ&e=>. > > > > Any comments are appreciated, especially the ones regarding the commands the > user will interact with. > > > > Thanks, > > Walter Erquinigo. > > > > > > # RFC: Processor Trace Support in LLDB > > > > > > # What is processor tracing? > > > > Processor tracing works by capturing information about the execution of a > process so that the control flow of the program can be reconstructed later. > Implementations of this are Intel Processor Trace for X86, x86_64 > ([https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html](https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html)) > and ARM CoreSight for some ARM devices > ([https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace](https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace)). > > > > As a clarifying example, with these technologies it’s possible to trace all > the threads of a process, and after the process has finished, reconstruct > every single instruction address each thread has executed. This could include > some additional information like timestamps, async CPU events, kernel > instructions, bus clock ratio changes, etc. On the other hand, memory and > registers are not traced as a way to limit the size of the trace. > > > > > > # Intel Processor Trace as the first implementation > > > > We’ll focus on Intel Processor Trace (Intel PT), but in a generic way so that > in the future similar technologies can be onboarded in LLDB. > > > > Intel PT has the following features: > > > > > > > > * Control flow tracing in a highly encoded format > > * 3% to 5% slowdown when capturing > > * No memory nor registers captured > > * Kernel tracing support > > * Timestamps of branches are produced, which can be used for profiling > > * Adjustable size of trace buffer > > * Supported on most Intel CPUs since 2015 > > * X86 and x86_64 only > > * Official support only on Linux > > * Basic support on Windows > > * Decoding/analysis can be done on any operating system > > > > A very nice introduction to Intel PT can be found > [https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html](https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html) > and > [https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace](https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace). > Totally recommended to fully grasp the impact of this project. > > > > More technical details are in > [https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt](https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt). > > > > Even more technical details are in the processor manual > [https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf](https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf) > > > > > > # Basic Definitions > > > > > > > > * Trace file: A trace file basically contains the information of the target > addresses of each branch or jump within the program execution in a highly > encoded format. > > * Capturing: The act of tracing a process and producing a trace file. > > * Decoding: Decoding outputs a sequential list of instructions given a > trace file and the images of a process. Decoding is generally an offline step > as it’s expensive. > > * Trace buffer: In order to limit the size of the trace, an on-memory > circular buffer can be used, keeping the most recent branching information. > The trace file is a snapshot of this. > > * Gap: Sporadically some branching information can be lost or be impossible > to decode, which creates a gap in the reconstructed control flow. > > > > > > # New LLDB features > > > > > > > > * Loading traces: We want to load traces potentially from other computers, > and have LLDB symbolicating it. A flow like the following should be possible \ > > > > > > ``` > > $ trace load /path/to/trace > > $ trace dump --instructions > > pid: '1234', tid: '1981309' > > a.out`main > > [57] 0x400549 <+13>: movl %eax, -0x4(%rbp) > > a.out`bar() > > [56] 0x40053b <+46>: retq > > [55] 0x40053a <+45>: leave > > [54] 0x400537 <+42>: movl -0x4(%rbp), %eax > > [53] 0x400535 <+40>: jle 0x400525 ; <+24> at > main.cpp:7 > > [52] 0x400531 <+36>: cmpl $0x3, -0x8(%rbp) > > [51] 0x40052d <+32>: addl $0x1, -0x8(%rbp) > > [50] 0x40052a <+29>: addl %eax, -0x4(%rbp) > > a.out`foo() > > [49] 0x400567 <+15>: retq > > [48] 0x400566 <+14>: popq %rbp > > [47] 0x400563 <+11>: movl -0x4(%rbp), %eax > > [46] 0x40055c <+4>: movl $0x2a, -0x4(%rbp) > > > > ... > > [1] 0x400559 <+1>: movq %rsp, %rbp > > [0] 0x400558 <+0>: pushq %rbp > > > > > > // Format: > > ``` > > > > > > > > ` // [instruction index] <instruction disassembly> \ > > `Notice the resemblance to loading a core file, but in this case we can get > the control flow, printed in reverse order in this example. > > > > > > > > * Decoding: LLDB can use libipt > ([https://github.com/intel/libipt](https://github.com/intel/libipt)), which > is the low level Intel PT decoding library, to convert trace files into > instructions. > > * Showing instructions: LLDB can output the list of instructions of the > control flow, as shown above > > * Showing function calls: Similarly, LLDB can print a hierarchical view of > the function calls. A flow like this should be possible: \ > > > > > > ``` > > $ trace load /path/to/trace > > $ trace dump --function-calls > > pid: '1234', tid: '1981309' > > [50] a.out`bar() 0x40052a > > [45] a.out`zaz() 0x400558 > > [40] a.out`baz() 0x400559 > > [30] a.out`foo() 0x400567 > > ``` > > > > > > > > ` [0] a.out`main 0x400000 \ > > \ > > `This functionality allows LLDB to reconstruct the call stack at any point > and potentially do reverse debugging. > > > > * Capturing: LLDB can also do the Intel PT capturing of a live process, so > that at any stop the user can do reverse stepping or simply inspect the > trace. A possible flow is: > > > > ``` > > $ <stopped at main> > > $ b main.cpp:50 > > $ trace start intel-pt // this initiates the tracing > > $ continue > > $ <stopped at main.cpp:50> > > $ trace dump --instructions > > pid: '1234', tid: '1981309' > > a.out`main > > [57] 0x400549 <+13>: movl %eax, -0x4(%rbp) > > a.out`bar() > > [56] 0x40053b <+46>: retq > > [55] 0x40053a <+45>: leave > > ``` > > > > > > > > Displaying time information: If the trace contains timing information, we > could also display it along with each instruction, e.g. > > > > > > ``` > > a.out`bar() > > [56: 1600284226]: 0x40053b <+46>: retq > > ... > > [4: 1600284200]: 0x40053a <+45>: leave > > // Format: > > // [instruction index: unix timestamp] <instruction disassembly> > > ``` > > > > > > > > Furthermore, we could display the time spent in each function. > > > > > > > > # Future LLDB features > > > > > > > > * Reverse Stepping: With the hierarchical reconstruction of the function > calls, along with the individual instructions, LLDB can offer reverse > stepping. Operations like reverse-next, reverse-step-out, reverse-continue > could work by traversing the trace. We plan to work on this once the features > presented above are in place. > > * Trace-based profiling > > * SB API of the mentioned features > > > > > > # Why is this useful? > > > > > > > > * Bug root-causing: > > * For example, a crash in a production Release build ends up being > analyzed with logs, a coredump, and a stack trace. Logs are not > comprehensive, and a stack trace only contains the final state of the > program. Providing the user with the control flow of the last milliseconds > gives a tremendous amount of information that is game-changing in > root-causing issues. It could be said that the user goes from a single stack > trace to a list of stack traces. > > * Reverse stepping enables more efficient debugging, as it reduces the > number of iterations to efficiently root-cause bugs. More often than not, > reproducing a bug takes a considerable amount of time, and the user needs to > reproduce it several times until the correct breakpoints are hit. This takes > a considerable amount of time. Giving the user the information of what has > been executed so far can help them figuring out where’s the location to place > a breakpoint, or to very easily figure out what went wrong. > > * Low cost: unlike other similar technologies, Intel PT has an almost > negligible performance cost regardless of whether the build is optimized or > not, making it appealing to a wide range of scenarios. > > * This infrastructure can be used for enabling other tools like > non-sample-based profilers with instruction-level accuracy, security > analyzers that check if certain memory regions are executed, and trace > comparators, which could find bugs by comparing similar traces. > > > > > > # Goals of this document: > > > > > > > > * Gather feedback on the basic Trace implementation, which would include > the following basic operations: loading, decoding, and dumping. > > * Create awareness about this work. > > * Get a green light on the current set of patches implementing this feature > starting with https://reviews.llvm.org/D85705. > > > > > > # Non-Goals: > > > > > > > > * Discuss how reverse-stepping will be implemented. This can be left for > another discussion. Once the Trace architecture is in place and robust, > reverse-stepping can then be discussed, as it’s a more controversial change > than this one. > > * Explain thoroughly Intel PT. > > > > > > # Existing Tool Support > > > > > > > > * GDB has a basic implementation of the features above > ([https://sourceware.org/gdb/onlinedocs/gdb/Process-Record-and-Replay.html](https://sourceware.org/gdb/onlinedocs/gdb/Process-Record-and-Replay.html)) > and some ideas are taken from there. > > * Perf is a standalone tool that can do capturing and decoding. > > * The Linux kernel has full support for doing capturing at thread, logical > cpu or cgroup level. > > * Intel developed a basic version of Intel PT support in LLDB as an > external plugin. > [https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674), > [https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b). > > > > > > # New Trace Commands > > > > Based on this patch > [https://reviews.llvm.org/D85705](https://reviews.llvm.org/D85705), there > would be a common Trace class along with plug-in implementations. > > > > > > ## Trace loading > > > > > > ### $ trace load /path/to/trace/settings/file.json > > > > As decoding a trace requires the images of the object files, the trace files > and some CPU information, it’s convenient to have a JSON file that describes > an entire trace session. The following JSON schema could be used. > > > > > > ``` > > { > > "trace": { > > … // plug-in specific information > > }, > > "processes": [ // process information common to all trace plug-ins > > { > > "pid": integer, > > "triple": string, // llvm-triple > > "threads": [ > > { > > "tid": integer, > > "traceFile": string > > } > > ], > > "modules": [ > > { > > "systemPath": string, // original path of the module at runtime > > "file"?: string, // copy of the file if not available at "systemPath" > > "loadAddress": string, // string address in hex or decimal form > > "uuid"?: string, > > } > > ] > > } > > ] > > } > > // Notes: > > // All paths are either absolute or relative to the settings file. > > ``` > > > > > > **Corefiles:** > > > > We plan to extend this schema to support corefiles, but we would leave it out > of this discussion, as can be easily seen as an extension of this basic > schema. > > > > **Implementation details:** > > > > To make our first implementation easier, we’ll ask for an individual trace > file per thread. This is the simpler collection mode for Intel PT. > > > > The entire json file will be translated into a Trace object, which contains > the trace information of each thread and process in it. > > > > Each process in the json file will be represented as a new Target. Similarly, > threads and modules for each target will be created following the json file. > This is very similar to what loading a minidump or coredump does. > > > > Each Target will be associated with a Trace, and multiple targets can share > the same Trace. The contract is that Trace is assumed to end at the current > PC of each thread of the target. > > > > > > ### $ trace schema <plug-in> > > > > This command prints the JSON schema of the trace settings file for the > provided plug-in. It would output something similar to this > > > > > > ``` > > { > > "trace": { > > "type": "intel-pt", > > "pt_cpu": { > > "vendor": "intel" | "unknown", > > "family": integer, > > "model": integer, > > "stepping": integer > > } > > }, > > "processes": [ > > { > > "pid": integer, > > "triple": string, // llvm-triple > > "threads": [ > > { > > "tid": integer, > > "traceFile": string > > } > > ], > > "modules": [ > > { > > "systemPath": string, // original path of the module at runtime > > "file"?: string, // copy of the file if not available at "systemPath" > > "loadAddress": string, // string address in hex or decimal form > > "uuid"?: string, > > } > > ] > > } > > ] > > } > > // Notes: > > // All paths are either absolute or relative to the settings file. > > ``` > > > > > > > > ### $ trace dump [--verbose] [-t tid1] [-t tid2] ... > > > > Print the trace information corresponding to the provided thread ids of the > currently selected target, which would mainly include the same information as > the trace settings file. If no tid is provided, the currently selected thread > is used. This would be useful for debugging. The information would be like > > > > Modules: > > > > <module info like systemPath, file, load address, uuid, size> > > > > Threads: > > > > <thread info like location of trace file, number of instructions (if > already decoded), number of function calls (if already decoded)> > > > > If <--verbose> is passed, the original settings.json file is printed as > well. > > > > > > ## Decoder-based commands > > > > The following commands require decoding the trace and are of the form. “trace > dump <action> [-t <tid>]”. If tids are not specified, then the current > thread or the current target will be used. > > > > > > ### $ trace dump --instructions [-t <tid>] [-c <count> = 10] [-o > <offset> = 0] > > > > This command would print the last <count> instructions starting at the > given offset from the last instruction in the trace. The output would be > similar to that of the “disassembly” command and would include timing > information if available. > > > > > > ``` > > $ trace dump --instructions -c 5 > > pid: '1234', tid: '1981309' > > a.out`main > > [57] 0x400549 <+13>: movl %eax, -0x4(%rbp) > > a.out`bar() > > [56] 0x40053b <+46>: retq > > [55] 0x40053a <+45>: leave > > [54] error -13. 'no memory mapped at this address' > > a.out`foo() > > [53] 0x400567 <+15>: retq > > ``` > > > > > > Repeating the command would continue printing where it was left off in the > last run. > > > > **Implementation details:** > > > > Each instruction output by the decoder is either an actual instruction or an > error. An error can be caused due to a collection error (e.g. internal CPU > buffer overflow error) or a decoding error (e.g. the image of an object file > is missing while decoding). These errors represent gaps in the trace and the > user should know about them, so we print them accordingly in this dump. > > > > Each instruction (including errors) has an index in the decoded trace, and > serves as a checkpoint. > > > > > > ### $ trace dump --function-calls [-t <tid>] [-c <count> = 10] [-o > <offset> = 0] [--flat] > > > > This command would print the hierarchical list of function calls. Similar to > the “--instructions” command, it would show the last <count> function > calls with the given offset from the last instructions. Timing information > would be included if available. > > > > > > ``` > > $ trace dump --function-calls > > pid: '1234', tid: '1981309' > > [50] a.out`bar() 0x40052a > > [45] a.out`zaz() 0x400558 > > [40] a.out`baz() 0x400559 > > [30] a.out`foo() 0x400567 > > [0] a.out`main 0x400000 > > ``` > > > > > > Repeating the command would continue printing where it was left off in the > last run. > > > > If <--flat> is passed, then instead of a hierarchical view, a flat list > would be produced. > > > > > > ## Capturing command > > > > > > ### $ trace start <plugin_name> [-t <tid>] [--all] [-b > <buffer_size_in_KB>] > > > > This command will start tracing the given thread of the currently selected > target, or all the threads of that target if “--all” is passed. If “--all” is > passed, any thread created after this command will also be traced > automatically. > > > > Besides, the optional -b parameter can define the size of each trace buffer > to be created. I haven’t yet decided a default one, but 1M might be > acceptable, as it traces around 1 million instructions on average according > to Intel, and that’s more than enough for a useful analysis. > > > > For an initial implementation, the plugin_name parameter will be required > (e.g. intel-pt). Later a more automated mechanism for finding the right > plugin can be implemented. > > > > **Implementation notes:** > > > > There’s already a basic implementation in lldb as an external plugin. It’s in > [https://reviews.llvm.org/source/llvm-github/browse/master/lldb/tools/intel-features/intel-pt/](https://reviews.llvm.org/source/llvm-github/browse/master/lldb/tools/intel-features/intel-pt/) > created by > [https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b). > It hasn’t received much attention and has been mostly unmaintained since it > was created. It’s already capable of tracing a given thread and collecting > the trace buffer. We plan to reuse that logic, which is already working. > > > > A Trace object will be created and will be associated with the current Target. > > > > Any interaction with trace, like dumping instructions, will trigger a fetch > of the most recent trace buffer, unless it hasn’t changed. > > > > When multiple threads are traced, each one will have its own trace buffer, as > sharing one buffer in multiple threads requires knowing when each context > switch happened so that the decoded trace can be split correctly among > threads. This is beyond the scope of the initial version of this project. > > > > > > ### $ trace save /path/to/file.json [--copy-images] > > > > This creates a bundle trace with settings saved in the given json file for > the current process. By default, it doesn’t create any copy of the images > loaded on the process, unless the “--copy-images” parameter is specified. > That parameter is useful for analyzing the trace in a machine other than > where it was captured. > > > > > > # Remote Protocol Changes > > > > No remote protocol changes are required, as > [https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674) and > [https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b) > already created them some years ago. > > > > > > # Build Requirements > > > > In order to build LLDB with this support, it has to be linked with a build of > libipt [https://github.com/intel/libipt](https://github.com/intel/libipt), > which is the decoder. > > > > > > # Operating System Requirements for Collection/Tracing > > > > Collection can only be done on linux if the file > /sys/bus/event_source/devices/intel_pt/type is defined. The logic gating this > feature is already checked in and defined in > [https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674). > > > > > > # Testing > > > > It’s fortunately straightforward to test this feature. It’s possible to > capture traces with perf or with the future “trace start” / ”trace save” > commands and create trace bundles with their corresponding settings .json > file. Analyzing those traces should give the same results on any machine, > making testing deterministic. > [https://reviews.llvm.org/D85705](https://reviews.llvm.org/D85705) and > descendents already implement some deterministic tests. > > _______________________________________________ > lldb-dev mailing list > lldb-dev@lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev >
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev