Re: [lldb-dev] RFC: Processor Trace Support in LLDB

Eric Christopher via lldb-dev Fri, 18 Sep 2020 19:58:40 -0700

Hi Walter,

I've only done a brief scan of the document but, in general, I'm favorable
of the goals, aim, and approach. Something I think would be good would be
to compare/contrast against rr as an "exploring alternatives" section of
the document. I think the document should also be made available/adapted to
be part of the documentation on "why lldb is implementing this feature/what
it can be used for/why".


Thanks so much for starting this and looking forward to the work and
collaboration.

-eric

On Thu, Sep 17, 2020 at 8:28 PM Walter via lldb-dev <lldb-dev@lists.llvm.org>
wrote:

> Hi all,
>
>
>
> Here I propose, along with Greg Clayton, Processor Trace support for LLDB. 
> I’m attaching a link to the document that contains this proposal if that’s 
> easier to read for you: 
> https://docs.google.com/document/d/1cOVTGp1sL_HBXjP9eB7qjVtDNr5xnuZvUUtv43G5eVI/edit#heading=h.t5mblb9ugv8f
>  
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1cOVTGp1sL-5FHBXjP9eB7qjVtDNr5xnuZvUUtv43G5eVI_edit-23heading-3Dh.t5mblb9ugv8f&d=DwMGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=erxV6KMIZvIQjyWYW8YpOiKz-WqJt4giKQA34YMHsRY&m=DuuwXHUQJpW4TcCay4hPsBund-eBI2uVaVimqEPsp5k&s=o6vqoYYbn-Tz_d34hoLJvWhEnnhracOO6yDsMzq8wR0&e=>.
>  Please make any comments in this mail list.
>
>
>
> If you want to quickly know what Processor Trace can do, you can read this 
> https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__easyperf.net_blog_2019_08_23_Intel-2DProcessor-2DTrace&d=DwMGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=erxV6KMIZvIQjyWYW8YpOiKz-WqJt4giKQA34YMHsRY&m=DuuwXHUQJpW4TcCay4hPsBund-eBI2uVaVimqEPsp5k&s=iaErHaf8byXlZb1YFUk0BpQ-duMhNouUUMyktLm3soQ&e=>.
>
>
>
> Any comments are appreciated, especially the ones regarding the commands the 
> user will interact with.
>
>
>
> Thanks,
>
> Walter Erquinigo.
>
>
>
>
>
> # RFC: Processor Trace Support in LLDB
>
>
>
>
>
> # What is processor tracing?
>
>
>
> Processor tracing works by capturing information about the execution of a 
> process so that the control flow of the program can be reconstructed later. 
> Implementations of this are Intel Processor Trace for X86, x86_64 
> ([https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html](https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html))
>  and ARM CoreSight for some ARM devices 
> ([https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace](https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace)).
>
>
>
> As a clarifying example, with these technologies it’s possible to trace all 
> the threads of a process, and after the process has finished, reconstruct 
> every single instruction address each thread has executed. This could include 
> some additional information like timestamps, async CPU events, kernel 
> instructions, bus clock ratio changes, etc. On the other hand, memory and 
> registers are not traced as a way to limit the size of the trace.
>
>
>
>
>
> # Intel Processor Trace as the first implementation
>
>
>
> We’ll focus on Intel Processor Trace (Intel PT), but in a generic way so that 
> in the future similar technologies can be onboarded in LLDB.
>
>
>
> Intel PT has the following features:
>
>
>
>
>
>
>
> *   Control flow tracing in a highly encoded format
>
> *   3% to 5% slowdown when capturing
>
> *   No memory nor registers captured
>
> *   Kernel tracing support
>
> *   Timestamps of branches are produced, which can be used for profiling
>
> *   Adjustable size of trace buffer
>
> *   Supported on most Intel CPUs since 2015
>
> *   X86 and x86_64 only
>
> *   Official support only on Linux
>
> *   Basic support on Windows
>
> *   Decoding/analysis can be done on any operating system
>
>
>
> A very nice introduction to Intel PT can be found 
> [https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html](https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html)
>  and 
> [https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace](https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace).
>  Totally recommended to fully grasp the impact of this project.
>
>
>
> More technical details are in 
> [https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt](https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt).
>
>
>
> Even more technical details are in the processor manual 
> [https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf](https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf)
>
>
>
>
>
> # Basic Definitions
>
>
>
>
>
>
>
> *   Trace file: A trace file basically contains the information of the target 
> addresses of each branch or jump within the program execution in a highly 
> encoded format.
>
> *   Capturing: The act of tracing a process and producing a trace file.
>
> *   Decoding: Decoding outputs a sequential list of instructions given a 
> trace file and the images of a process. Decoding is generally an offline step 
> as it’s expensive.
>
> *   Trace buffer: In order to limit the size of the trace, an on-memory 
> circular buffer can be used, keeping the most recent branching information. 
> The trace file is a snapshot of this.
>
> *   Gap: Sporadically some branching information can be lost or be impossible 
> to decode, which creates a gap in the reconstructed control flow.
>
>
>
>
>
> # New LLDB features
>
>
>
>
>
>
>
> *   Loading traces: We want to load traces potentially from other computers, 
> and have LLDB symbolicating it. A flow like the following should be possible \
>
>
>
>
>
>     ```
>
>     $ trace load /path/to/trace
>
>     $ trace dump --instructions
>
>     pid: '1234', tid: '1981309'
>
>       a.out`main
>
>       [57] 0x400549 <+13>: movl   %eax, -0x4(%rbp)
>
>       a.out`bar()
>
>       [56] 0x40053b <+46>: retq
>
>       [55] 0x40053a <+45>: leave
>
>       [54] 0x400537 <+42>: movl   -0x4(%rbp), %eax
>
>       [53] 0x400535 <+40>: jle    0x400525                  ; <+24> at 
> main.cpp:7
>
>       [52] 0x400531 <+36>: cmpl   $0x3, -0x8(%rbp)
>
>       [51] 0x40052d <+32>: addl   $0x1, -0x8(%rbp)
>
>       [50] 0x40052a <+29>: addl   %eax, -0x4(%rbp)
>
>       a.out`foo()
>
>       [49] 0x400567 <+15>: retq
>
>       [48] 0x400566 <+14>: popq   %rbp
>
>       [47] 0x400563 <+11>: movl   -0x4(%rbp), %eax
>
>       [46] 0x40055c <+4>: movl   $0x2a, -0x4(%rbp)
>
>
>
>               ...
>
>           [1] 0x400559 <+1>: movq   %rsp, %rbp
>
>           [0] 0x400558 <+0>: pushq  %rbp
>
>
>
>
>
>           // Format:
>
>     ```
>
>
>
>
>
>
>
>     `  // [instruction index] &lt;instruction disassembly> \
>
> `Notice the resemblance to loading a core file, but in this case we can get 
> the control flow, printed in reverse order in this example.
>
>
>
>
>
>
>
> *   Decoding: LLDB can use libipt 
> ([https://github.com/intel/libipt](https://github.com/intel/libipt)), which 
> is the low level Intel PT decoding library, to convert trace files into 
> instructions.
>
> *   Showing instructions: LLDB can output the list of instructions of the 
> control flow, as shown above
>
> *   Showing function calls: Similarly, LLDB can print a hierarchical view of 
> the function calls. A flow like this should be possible: \
>
>
>
>
>
>     ```
>
>     $ trace load /path/to/trace
>
>     $ trace dump --function-calls
>
>     pid: '1234', tid: '1981309'
>
>       [50]     a.out`bar()         0x40052a
>
>       [45]       a.out`zaz()       0x400558
>
>       [40]     a.out`baz()         0x400559
>
>       [30]   a.out`foo()           0x400567
>
>     ```
>
>
>
>
>
>
>
>     `  [0]  a.out`main              0x400000 \
>
>  \
>
> `This functionality allows LLDB to reconstruct the call stack at any point 
> and potentially  do reverse debugging.
>
>
>
> *   Capturing: LLDB can also do the Intel PT capturing of a live process, so 
> that at any stop the user can do reverse stepping or simply inspect the 
> trace. A possible flow is:
>
>
>
>     ```
>
>     $ <stopped at main>
>
>     $ b main.cpp:50
>
>     $ trace start intel-pt // this initiates the tracing
>
>     $ continue
>
>     $ <stopped at main.cpp:50>
>
>     $ trace dump --instructions
>
> pid: '1234', tid: '1981309'
>
>       a.out`main
>
>       [57] 0x400549 <+13>: movl   %eax, -0x4(%rbp)
>
>       a.out`bar()
>
>       [56] 0x40053b <+46>: retq
>
>       [55] 0x40053a <+45>: leave
>
>     ```
>
>
>
>
>
>
>
>     Displaying time information: If the trace contains timing information, we 
> could also display it along with each instruction, e.g.
>
>
>
>
>
>     ```
>
>     a.out`bar()
>
>     [56: 1600284226]: 0x40053b <+46>: retq
>
>     ...
>
>     [4:  1600284200]: 0x40053a <+45>: leave
>
>     // Format:
>
>     // [instruction index: unix timestamp] <instruction disassembly>
>
>     ```
>
>
>
>
>
>
>
>     Furthermore, we could display the time spent in each function.
>
>
>
>
>
>
>
> # Future LLDB features
>
>
>
>
>
>
>
> *   Reverse Stepping: With the hierarchical reconstruction of the function 
> calls, along with the individual instructions, LLDB can offer reverse 
> stepping. Operations like reverse-next, reverse-step-out, reverse-continue 
> could work by traversing the trace. We plan to work on this once the features 
> presented above are in place.
>
> *   Trace-based profiling
>
> *   SB API of the mentioned features
>
>
>
>
>
> # Why is this useful?
>
>
>
>
>
>
>
> *   Bug root-causing:
>
>     *   For example, a crash in a production Release build ends up being 
> analyzed with logs, a coredump, and a stack trace. Logs are not 
> comprehensive, and a stack trace only contains the final state of the 
> program. Providing the user with the control flow of the last milliseconds 
> gives a tremendous amount of information that is game-changing in 
> root-causing issues. It could be said that the user goes from a single stack 
> trace to a list of stack traces.
>
>     *   Reverse stepping enables more efficient debugging, as it reduces the 
> number of iterations to efficiently root-cause bugs. More often than not, 
> reproducing a bug takes a considerable amount of time, and the user needs to 
> reproduce it several times until the correct breakpoints are hit. This takes 
> a considerable amount of time. Giving the user the information of what has 
> been executed so far can help them figuring out where’s the location to place 
> a breakpoint, or to very easily figure out what went wrong.
>
> *   Low cost: unlike other similar technologies, Intel PT has an almost 
> negligible performance cost regardless of whether the build is optimized or 
> not, making it appealing to a wide range of scenarios.
>
> *   This infrastructure can be used for enabling other tools like 
> non-sample-based profilers with instruction-level accuracy, security 
> analyzers that check if certain memory regions are executed, and trace 
> comparators, which could find bugs by comparing similar traces.
>
>
>
>
>
> # Goals of this document:
>
>
>
>
>
>
>
> *   Gather feedback on the basic Trace implementation, which would include 
> the following basic operations: loading, decoding, and dumping.
>
> *   Create awareness about this work.
>
> *   Get a green light on the current set of patches implementing this feature 
> starting with https://reviews.llvm.org/D85705.
>
>
>
>
>
> # Non-Goals:
>
>
>
>
>
>
>
> *   Discuss how reverse-stepping will be implemented. This can be left for 
> another discussion. Once the Trace architecture is in place and robust, 
> reverse-stepping can then be discussed, as it’s a more controversial change 
> than this one.
>
> *   Explain thoroughly Intel PT.
>
>
>
>
>
> # Existing Tool Support
>
>
>
>
>
>
>
> *   GDB has a basic implementation of the features above 
> ([https://sourceware.org/gdb/onlinedocs/gdb/Process-Record-and-Replay.html](https://sourceware.org/gdb/onlinedocs/gdb/Process-Record-and-Replay.html))
>  and some ideas are taken from there.
>
> *   Perf is a standalone tool that can do capturing and decoding.
>
> *   The Linux kernel has full support for doing capturing at thread, logical 
> cpu or cgroup level.
>
> *   Intel developed a basic version of Intel PT support in LLDB as an 
> external plugin. 
> [https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674), 
> [https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b).
>
>
>
>
>
> # New Trace Commands
>
>
>
> Based on this patch 
> [https://reviews.llvm.org/D85705](https://reviews.llvm.org/D85705), there 
> would be a common Trace class along with plug-in implementations.
>
>
>
>
>
> ## Trace loading
>
>
>
>
>
> ### $ trace load /path/to/trace/settings/file.json
>
>
>
> As decoding a trace requires the images of the object files, the trace files 
> and some CPU information, it’s convenient to have a JSON file that describes 
> an entire trace session. The following JSON schema could be used.
>
>
>
>
>
> ```
>
> {
>
> "trace": {
>
>    … // plug-in specific information
>
>  },
>
>  "processes": [      // process information common to all trace plug-ins
>
>    {
>
>      "pid": integer,
>
>      "triple": string, // llvm-triple
>
>      "threads": [
>
>        {
>
>          "tid": integer,
>
>          "traceFile": string
>
>        }
>
>      ],
>
>      "modules": [
>
>        {
>
>          "systemPath": string, // original path of the module at runtime
>
>          "file"?: string, // copy of the file if not available at "systemPath"
>
>          "loadAddress": string, // string address in hex or decimal form
>
>          "uuid"?: string,
>
>        }
>
>      ]
>
>    }
>
>  ]
>
> }
>
> // Notes:
>
> // All paths are either absolute or relative to the settings file.
>
> ```
>
>
>
>
>
> **Corefiles:**
>
>
>
> We plan to extend this schema to support corefiles, but we would leave it out 
> of this discussion, as can be easily seen as an extension of this basic 
> schema.
>
>
>
> **Implementation details:**
>
>
>
> To make our first implementation easier, we’ll ask for an individual trace 
> file per thread. This is the simpler collection mode for Intel PT.
>
>
>
> The entire json file will be translated into a Trace object, which contains 
> the trace information of each thread and process in it.
>
>
>
> Each process in the json file will be represented as a new Target. Similarly, 
> threads and modules for each target will be created following the json file. 
> This is very similar to what loading a minidump or coredump does.
>
>
>
> Each Target will be associated with a Trace, and multiple targets can share 
> the same Trace. The contract is that Trace is assumed to end at the current 
> PC of each thread of the target.
>
>
>
>
>
> ### $ trace schema &lt;plug-in>
>
>
>
> This command prints the JSON schema of the trace settings file for the 
> provided plug-in. It would output something similar to this
>
>
>
>
>
> ```
>
> {
>
> "trace": {
>
>    "type": "intel-pt",
>
>    "pt_cpu": {
>
>      "vendor": "intel" | "unknown",
>
>      "family": integer,
>
>      "model": integer,
>
>      "stepping": integer
>
>    }
>
>  },
>
>  "processes": [
>
>    {
>
>      "pid": integer,
>
>      "triple": string, // llvm-triple
>
>      "threads": [
>
>        {
>
>          "tid": integer,
>
>          "traceFile": string
>
>        }
>
>      ],
>
>      "modules": [
>
>        {
>
>          "systemPath": string, // original path of the module at runtime
>
>          "file"?: string, // copy of the file if not available at "systemPath"
>
>          "loadAddress": string, // string address in hex or decimal form
>
>          "uuid"?: string,
>
>        }
>
>      ]
>
>    }
>
>  ]
>
> }
>
> // Notes:
>
> // All paths are either absolute or relative to the settings file.
>
> ```
>
>
>
>
>
>
>
> ### $ trace dump [--verbose] [-t tid1] [-t tid2] ...
>
>
>
> Print the trace information corresponding to the provided thread ids of the 
> currently selected target, which would mainly include the same information as 
> the trace settings file. If no tid is provided, the currently selected thread 
> is used. This would be useful for debugging. The information would be like
>
>
>
>   Modules:
>
>
>
>     &lt;module info like systemPath, file, load address, uuid, size>
>
>
>
>   Threads:
>
>
>
>     &lt;thread info like location of trace file, number of instructions (if 
> already decoded), number   of function calls (if already decoded)>
>
>
>
> If &lt;--verbose> is passed, the original settings.json file is printed as 
> well.
>
>
>
>
>
> ## Decoder-based commands
>
>
>
> The following commands require decoding the trace and are of the form. “trace 
> dump &lt;action> [-t &lt;tid>]”. If tids are not specified, then the current 
> thread or the current target will be used.
>
>
>
>
>
> ### $ trace dump --instructions [-t &lt;tid>] [-c &lt;count> = 10] [-o 
> &lt;offset> = 0]
>
>
>
> This command would print the last &lt;count> instructions starting at the 
> given offset from the last instruction in the trace. The output would be 
> similar to that of the “disassembly” command and would include timing 
> information if available.
>
>
>
>
>
> ```
>
>     $ trace dump --instructions -c 5
>
>     pid: '1234', tid: '1981309'
>
>       a.out`main
>
>       [57] 0x400549 <+13>: movl   %eax, -0x4(%rbp)
>
>       a.out`bar()
>
>       [56] 0x40053b <+46>: retq
>
>       [55] 0x40053a <+45>: leave
>
>       [54] error -13. 'no memory mapped at this address'
>
>       a.out`foo()
>
>       [53] 0x400567 <+15>: retq
>
> ```
>
>
>
>
>
> Repeating the command would continue printing where it was left off in the 
> last run.
>
>
>
> **Implementation details:**
>
>
>
> Each instruction output by the decoder is either an actual instruction or an 
> error. An error can be caused due to a collection error (e.g. internal CPU 
> buffer overflow error) or a decoding error (e.g. the image of an object file 
> is missing while decoding). These errors represent gaps in the trace and the 
> user should know about them, so we print them accordingly in this dump.
>
>
>
> Each instruction (including errors) has an index in the decoded trace, and 
> serves as a checkpoint.
>
>
>
>
>
> ### $ trace dump --function-calls [-t &lt;tid>] [-c &lt;count> = 10] [-o 
> &lt;offset> = 0] [--flat]
>
>
>
> This command would print the hierarchical list of function calls. Similar to 
> the “--instructions” command, it would show the last &lt;count> function 
> calls with the given offset from the last instructions. Timing information 
> would be included if available.
>
>
>
>
>
> ```
>
>     $ trace dump --function-calls
>
>     pid: '1234', tid: '1981309'
>
>       [50]     a.out`bar()         0x40052a
>
>       [45]       a.out`zaz()       0x400558
>
>       [40]     a.out`baz()         0x400559
>
>       [30]   a.out`foo()           0x400567
>
>       [0]  a.out`main              0x400000
>
> ```
>
>
>
>
>
> Repeating the command would continue printing where it was left off in the 
> last run.
>
>
>
> If &lt;--flat> is passed, then instead of a hierarchical view, a flat list 
> would be produced.
>
>
>
>
>
> ## Capturing command
>
>
>
>
>
> ### $ trace start &lt;plugin_name> [-t &lt;tid>] [--all] [-b 
> &lt;buffer_size_in_KB>]
>
>
>
> This command will start tracing the given thread of the currently selected 
> target, or all the threads of that target if “--all” is passed. If “--all” is 
> passed, any thread created after this command will also be traced 
> automatically.
>
>
>
> Besides, the optional -b parameter can define the size of each trace buffer 
> to be created. I haven’t yet decided a default one, but 1M might be 
> acceptable, as it traces around 1 million instructions on average according 
> to Intel, and that’s more than enough for a useful analysis.
>
>
>
> For an initial implementation, the plugin_name parameter will be required 
> (e.g. intel-pt). Later a more automated mechanism for finding the right 
> plugin can be implemented.
>
>
>
> **Implementation notes:**
>
>
>
> There’s already a basic implementation in lldb as an external plugin. It’s in 
> [https://reviews.llvm.org/source/llvm-github/browse/master/lldb/tools/intel-features/intel-pt/](https://reviews.llvm.org/source/llvm-github/browse/master/lldb/tools/intel-features/intel-pt/)
>  created by 
> [https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b).
>  It hasn’t received much attention and has been mostly unmaintained since it 
> was created. It’s already capable of tracing a given thread and collecting 
> the trace buffer. We plan to reuse that logic, which is already working.
>
>
>
> A Trace object will be created and will be associated with the current Target.
>
>
>
> Any interaction with trace, like dumping instructions, will trigger a fetch 
> of the most recent trace buffer, unless it hasn’t changed.
>
>
>
> When multiple threads are traced, each one will have its own trace buffer, as 
> sharing one buffer in multiple threads requires knowing when each context 
> switch happened so that the decoded trace can be split correctly among 
> threads. This is beyond the scope of the initial version of this project.
>
>
>
>
>
> ### $ trace save /path/to/file.json [--copy-images]
>
>
>
> This creates a bundle trace with settings saved in the given json file for 
> the current process. By default, it doesn’t create any copy of the images 
> loaded on the process, unless the “--copy-images” parameter is specified. 
> That parameter is useful for analyzing the trace in a machine other than 
> where it was captured.
>
>
>
>
>
> # Remote Protocol Changes
>
>
>
> No remote protocol changes are required, as 
> [https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674) and 
> [https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b)
>  already created them some years ago.
>
>
>
>
>
> # Build Requirements
>
>
>
> In order to build LLDB with this support, it has to be linked with a build of 
> libipt [https://github.com/intel/libipt](https://github.com/intel/libipt), 
> which is the decoder.
>
>
>
>
>
> # Operating System Requirements for Collection/Tracing
>
>
>
> Collection can only be done on linux if the file 
> /sys/bus/event_source/devices/intel_pt/type is defined. The logic gating this 
> feature is already checked in and defined in 
> [https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674).
>
>
>
>
>
> # Testing
>
>
>
> It’s fortunately straightforward to test this feature. It’s possible to 
> capture traces with perf or with the future “trace start” / ”trace save” 
> commands and create trace bundles with their corresponding settings .json 
> file. Analyzing those traces should give the same results on any machine, 
> making testing deterministic. 
> [https://reviews.llvm.org/D85705](https://reviews.llvm.org/D85705) and 
> descendents already implement some deterministic tests.
>
> _______________________________________________
> lldb-dev mailing list
> lldb-dev@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] RFC: Processor Trace Support in LLDB

Reply via email to