Hi everyone, I’m Ismail, a compiler engineer intern at Apple. As a part of my internship, I'm adding Fast Conditional Breakpoints to LLDB, using code patching.
Currently, the expressions that power conditional breakpoints are lowered to LLVM IR and LLDB knows how to interpret a subset of it. If that fails, the debugger JIT-compiles the expression (compiled once, and re-run on each breakpoint hit). In both cases LLDB must collect all program state used in the condition and pass it to the expression. The goal of my internship project is to make conditional breakpoints faster by: 1. Compiling the expression ahead-of-time, when setting the breakpoint and inject into the inferior memory only once. 2. Re-route the inferior execution flow to run the expression and check whether it needs to stop, in-process. This saves the cost of having to do the context switch between debugger and the inferior program (about 10 times) to compile and evaluate the condition. This feature is described on the [LLDB Project page](https://lldb.llvm.org/status/projects.html#use-the-jit-to-speed-up-conditional-breakpoint-evaluation). The goal would be to have it working for most languages and architectures supported by LLDB, however my original implementation will be for C-based languages targeting x86_64. It will be extended to AArch64 afterwards. Note the way my prototype is implemented makes it fully extensible for other languages and architectures. ## High Level Design Every time a breakpoint that holds a condition is hit, multiple context switches are needed in order to compile and evaluate the condition. First, the breakpoint is hit and the control is given to the debugger. That's where LLDB wraps the condition expression into a UserExpression that will get compiled and injected into the program memory. Another round-trip between the inferior and the LLDB is needed to run the compiled expression and extract the expression results that will tell LLDB to stop or not. To get rid of those context switches, we will evaluate the condition inside the program, and only stop when the condition is true. LLDB will achieve this by inserting a jump from the breakpoint address to a code section that will be allocated into the program memory. It will save the thread state, run the condition expression, restore the thread state and then execute the copied instruction(s) before jumping back to the regular program flow. Then we only trap and return control to LLDB when the condition is true. ## Implementation Details To be able to evaluate a breakpoint condition without interacting with the debugger, LLDB changes the inferior program execution flow by overwriting the instruction at which the breakpoint was set with a branching instruction. The original instruction(s) are copied to a memory stub allocated in the inferior program memory called the __Fast Conditional Breakpoint Trampoline__ or __FCBT__. The FCBT will allow us the re-route the program execution flow to check the condition in-process while preserving the original program behavior. This part is critical to setup Fast Conditional Breakpoints. ``` Inferior Binary Trampoline | . | +-------------------------+ | . | | | | . | +--------->+ Save RegisterContext | | . | | | | +-------------------------+ | +-------------------------+ | | | | | | Instruction | | | Build Arguments Struct | | | | | | +-------------------------+ | +-------------------------+ | +-----------+ | | | Branch to Trampoline | | Call Condition Checker | | +<----------+ | | +-------------------------+ | +-------------------------+ | | | | | | Instruction | | | Restore RegisterContext | | | | | | +-------------------------+ | +-------------------------+ | . | | | | | . | +----------+ Run Copied Instructions | | . | | | | . | +-------------------------+ ``` Once the execution reaches the Trampoline, several steps need to be taken. LLDB relies on its UserExpressions to JIT these more complex conditional expressions. However, since the execution will be handled by the debugged program, LLDB will generate some code ahead-of-time in theTrampoline that will allow the inferior to initialize the expression's argument structure. Generating the condition checker as well as the code to initialize the argument structure of each breakpoint hit is handled by __BreakpointInjectedSite__ class, which builds the conditional expression for all the BreakpointLocations, emits the `$__lldb_expr` function, and relocates variables in the `$__lldb_arg` structure. BreakpointInjectedSites are created in the __Process__ if the user enables the `-I | --inject-condition` flag when setting or modifying a breakpoint. Because the __FCBT__ is architecture specific, BreakpointInjectedSites will only be available when a target has added support to it, in the matching Architecture Plugin. Several parts of lldb have to be modified to implement this feature: - **Breakpoint**: Added BreakpointInjectedSite, and helper functions to the related class (Breakpoint, BreakpointLocation, BreakpointSite, BreakpointOptions) - **Plugins**: Added ObjectFileTrampoline for the unwinding Added x86_64 ABI support (FCBT setup & safety checks) - **Symbol**: Changed `FuncUnwinders` and `UnwindPlan` to support FCBT - **Target**: Added BreakpointInjectedSite creation to `Process` to insert the jump to the FCBT Added the Trampoline module creation to `ABI` for the unwinding ### Breakpoint Option Since Fast Conditional Breakpoints are still under development, they will not be on by default, but rather we will provide a flag to 'breakpoint set" and "breakpoint modify" to enable the feature. Note that the end-goal is to have them as a default and only fallback to regular conditional breakpoints on unsupported architectures. They can be enabled when using `-I | --inject-condition` option. These options can also be enabled using the Python Scripting Bridge public API, using the `InjectCondition(bool enable)` method on an __SBBreakpoint__ or __SBBreakpointLocation__ object. This feature is intended to be used with condition expression (`-c <expr> | --condition <expr>`), but also other conditions types such as: - Thread ID (`-t <thread-id> | --thread-id <thread-id>`) - Thread Index (`-x <thread-index> | --thread-index <thread-index>`) - Thread Queue Name ### Trampoline To be able to inject the condition, we need to re-route the debugged program's execution flow. This parts is handled in the __Trampoline__, a memory stub allocated in the inferior that will contain the condition check while preserving the program's original behavior. The trampoline is architecture specific and built by lldb. To have the condition evaluation work out-of-place, several steps need to be completed: 1. Save all the registers by pushing them to the stack 2. Build the `$__lldb_arg` structure by calling a injected UtilityFunction 3. Check the condition by calling the injected UserExpression and execute a trap if the condition is true. 4. Restore register context 5. Rewrite and run original copied instructions operands All the values needed for the steps can be computed ahead of time, when the breakpoint is set (i.e: size of the allocation, jump address, relocation ...). Since the x86_64 ISA has variable instruction size, LLDB moves enough instructions in the trampoline to be able to overwrite them with a jump to the trampoline. Also, the allocation region for the trampoline might be too far away for a single jump, so we might need to have several branch island before reaching the trampoline (WIP). ### BreakpointInjectedSite To handle the Fast Conditional Breakpoint setup, LLDB uses __BreakpointInjectedSites__ which is a sub-class of the BreakpointSite class. BreakpointInjectedSites uses different `UserExpression` to resolve variables and inject the condition checker. #### Condition Checker Because a BreakpointSite can have multiple BreakpointLocations with different conditions, LLDB need first iterate over each owner of the BreakpointSite and gather all the conditions. If one of the BreakpointLocations doesn't have a condition or the condition is not set to be injected, the BreakpointInjectedSite will behave as a regular BreakpointSite. Once all the conditions are fetched, LLDB will create a __UserExpression__ with the injected trap instruction. When a trap is hit, LLDB uses the __BreakpointSiteList__, a map from a trap address to a BreakpointSite to identify where to stop. To allow LLDB to catch the injected trap at runtime, it will disassemble the compiled expression and scan for the trap address. The injected trap address is then added to LLDB's __BreakpointSiteList__. When generated, this is what the condition checker looks like: ```cpp void $__lldb_expr(void *$__lldb_arg) { /*lldb_BODY_START*/ if (condition) { __builtin_debugtrap(); }; /*lldb_BODY_END*/ } ``` #### Argument Builder The conditional expression will often refer to local variables, and the references to these variables need to be tied to the instances of them in the current frame. Usually the expression evaluator invokes the __Materializer__ which fetches the variables values and fills the `$__lldb_arg` structure. But since we don't want to switch contexts, LLDB has to resolve used variables by generating code that will initialize the `$__lldb_arg` pointer, before running the condition checker. That's where the __Argument Builder__ comes in. The argument builder uses an `UtilityFunction` to generate the `$__lldb_create_args_struct` function. It is called by the Trampoline before the condition checker, in order to resolve variables used in the condition expression. `$__lldb_create_args_struct` will fill the `$__lldb_arg` in several steps: 1. It takes advantage of the fact that LLDB saved all the registers to the stack and map them in an `register_context` structure. ```cpp typedef struct { // General Purpose Registers } register_context; ``` 2. Using information from the variable resolver, it allocates a memory stub that will contain the used variable addresses. 3. Then, it will use the register values and the collected metadata to compute the used variable address and write that into the newly allocated structure. 4. Finally the allocated structure is returned to the trampoline, which will pass it as an argument to the injected condition checker. Since `$__lldb_create_args_struct` uses the same JIT Engine as the UserExpression, LLDB will parse, build and insert it in the program memory. #### Variable Resolver When creating a Fast Conditional Breakpoint, the __debug info__ tells us where the used variables are located. Using this information and the saved register context, we can generate code that will resolve the variables at runtime (__Step 3 of the Argument Builder__). LLDB will first get the `DeclMap` from the condition UserExpression and pull a list of the used variables. While iterating on that list, LLDB extracts each variable's __DWARF Expression__. DWARF expressions explain how to reconstruct a variable's values using DWARF operations. The reason why LLDB needs the register context is because local variable are often at an offset of the __Stack Base Pointer register__ or written across one or multiple registers. This is why I've only focused on `DW_OP_fbreg` expressions since I could get the offset of the variable and add it to the base pointer register to get its address. The variable address, and other metadata such as its size, its identifier and the DWARF Expression are saved to an `ArgumentMetadata` vector that will be used by the `ArgumentBuilder` to create the `$__lldb_arg` structure. Since all the registers are already mapped to a structure, I should be able to support more __DWARF Operations__ in the future. After collecting some metrics on the __Clang__ binary, built at __-O0__, the debug info shows that __99%__ of the most used DWARF Operations are : |DWARF Operation| Occurrences | |---------------|---------------------------| |DW\_OP_fbreg | 2 114 612 | |DW\_OP_reg | 820 548 | |DW\_OP_constu | 267 450 | |DW\_OP_addr | 17 370 | | __Top 4__ | __3 219 980 Occurrences__ | |---------------|---------------------------| | __Total__ | __3 236 859 Occurrences__ | Those 4 operations are the one that I'll support for now. To support more complex expressions, we would need to JIT-compile a DWARF expression interpreter. ### Unwinders When the program hits the injected trap instruction, the execution stops inside the injected UserExpression. ```cpp * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x00000001001070b9 $__lldb_expr`$__lldb_expr($__lldb_arg=0x00000001f5671000) at lldb-33192c.expr:49 frame #1: 0x0000000100105028 ``` This part of the program should be transparent to user. To allow LLDB to elide the condition checker and the FCBT frame, the Unwinder needs to be able to identify all of the frames, up to the user's source code frame. The injected UserExpression already has a valid stack frame, but it doesn't have any information about its caller, the Trampoline. In order to unwind to the user's code, LLDB needs symbolic information for the trampoline. This information is tied to LLDB modules, created using an ObjectFile representation, the __ObjectFileTrampoline__ in our case. It will contain several pieces of information such as, the module's name and description, but most importantly the module __Symbol Table__ that will have the trampoline symbol (`$__lldb_injected_conditional_bp_trampoline `) and a __Text Section__ that will tell the unwinder the trampoline bounds. Then, LLDB inserts a __Function Unwinder__ in the module UnwindTable and creates an __Unwind Plan__ pointing to the BreakpointLocation return address. This is done taking into consideration that the trampoline will alter the memory layout by spilling registers to the stack. Finally, the newly created module is appended to the target image list, which allows LLDB to move between the injected code and the user code seamlessly. This is what the backtrace looks like after hitting the injected trap: ```cpp * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x00000001001070b9 $__lldb_expr`$__lldb_expr($__lldb_arg=0x00000001f4c71000) at lldb-ca98b7.expr:49 frame #1: 0x0000000100105028 $__lldb_injected_conditional_bp_trampoline`$__lldb_injected_conditional_bp_trampoline + 40 * frame #2: 0x0000000100000f5b main`main at main.c:7:23 ``` For now, LLDB selects the user frame but the goal would be to mask all the frames introduced by the Fast Conditional Breakpoint. A `debug-injected-condition` setting will allow to stop at the FCBT and show all the elided frames. ### Instruction Shifter (WIP) Because some instructions might use operands that are at an offsets relative to the program counter, copying the instructions to a new location might change their meaning: LLDB needs to patch each instruction with the right offset. This is done using `LLVM::MCInst` tool in order to detect the instructions that need to be rewritten. ## Risk Mitigation The optimization relies heavily on code injection, most of which is architecture specific. Because of this, overwriting the instructions can fail depending of the breakpoint location, e.g.: - If the overwritten instructions contains indirection (branch instructions). - If the overwritten instructions are a branch target. - If there is not enough instructions to insert the branch instruction (x86_64) If the setup process fails to insert the Fast Conditional Breakpoint, it will fallback to the legacy behavior, and warn the user about what went wrong. One way to mitigate those limitations would be to use code instrumentation to detect if it's safe to set a Fast Condition Breakpoint at a certain location, and hint the user to move the FCB before or after the location where it was set originally. ## Prototype Code I submitted my patches ([1](reviews.llvm.org/D66248), [2](reviews.llvm.org/D66249), [3](reviews.llvm.org/D66250)) on Phabricator with the prototype. ## Feedback Before moving forward I'd like to get the community's input. What do you think about this approach? Any feedback would be greatly appreciated! Thanks, -- Mohamed Ismail Bennani Compiler Engineer Intern - Apple Inc. _______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev