>> Hi Pavel,
>> Thanks for all your feedbacks.
>> I’ve been following the discussion closely and find your approach quite 
>> interesting.
>> As Jim explained, I’m also trying to have a conditional breakpoint, that is 
>> able to stop a specific thread (name or id) when the condition expression 
>> evaluates to true.
>> I feel like stacking up options with your approach would imply doing more 
>> context switches.
>> But it’s definitely a better fallback mechanism than the current one. I’ll 
>> try to make a prototype to see the performance difference for both 
>> approaches.
>>> On Aug 15, 2019, at 10:10 AM, Pavel Labath <pa...@labath.sk> wrote:
>>> Hello Ismail, and wellcome to LLDB. You have a very interesting (and not 
>>> entirely trivial) project, and I wish you the best of luck in your work. I 
>>> think this will be a very useful addition to lldb.
>>> It sounds like you have researched the problem very well, and the overall 
>>> direction looks good to me. However, I do have some ideas suggestions about 
>>> possible tweaks/improvements that I would like to hear your thoughts on. 
>>> Please find my comments inline.
>>> On 14/08/2019 22:52, Ismail Bennani via lldb-dev wrote:
>>>> Hi everyone,
>>>> I’m Ismail, a compiler engineer intern at Apple. As a part of my 
>>>> internship,
>>>> I'm adding Fast Conditional Breakpoints to LLDB, using code patching.
>>>> Currently, the expressions that power conditional breakpoints are lowered
>>>> to LLVM IR and LLDB knows how to interpret a subset of it. If that fails,
>>>> the debugger JIT-compiles the expression (compiled once, and re-run on each
>>>> breakpoint hit). In both cases LLDB must collect all program state used in
>>>> the condition and pass it to the expression.
>>>> The goal of my internship project is to make conditional breakpoints 
>>>> faster by:
>>>> 1. Compiling the expression ahead-of-time, when setting the breakpoint and
>>>>   inject into the inferior memory only once.
>>>> 2. Re-route the inferior execution flow to run the expression and check 
>>>> whether
>>>>   it needs to stop, in-process.
>>>> This saves the cost of having to do the context switch between debugger and
>>>> the inferior program (about 10 times) to compile and evaluate the 
>>>> condition.
>>>> This feature is described on the [LLDB Project 
>>>> page](https://lldb.llvm.org/status/projects.html#use-the-jit-to-speed-up-conditional-breakpoint-evaluation).
>>>> The goal would be to have it working for most languages and architectures
>>>> supported by LLDB, however my original implementation will be for C-based
>>>> languages targeting x86_64. It will be extended to AArch64 afterwards.
>>>> Note the way my prototype is implemented makes it fully extensible for 
>>>> other
>>>> languages and architectures.
>>>> ## High Level Design
>>>> Every time a breakpoint that holds a condition is hit, multiple context
>>>> switches are needed in order to compile and evaluate the condition.
>>>> First, the breakpoint is hit and the control is given to the debugger.
>>>> That's where LLDB wraps the condition expression into a UserExpression that
>>>> will get compiled and injected into the program memory. Another round-trip
>>>> between the inferior and the LLDB is needed to run the compiled expression
>>>> and extract the expression results that will tell LLDB to stop or not.
>>>> To get rid of those context switches, we will evaluate the condition inside
>>>> the program, and only stop when the condition is true. LLDB will achieve 
>>>> this
>>>> by inserting a jump from the breakpoint address to a code section that will
>>>> be allocated into the program memory. It will save the thread state, run 
>>>> the
>>>> condition expression, restore the thread state and then execute the copied
>>>> instruction(s) before jumping back to the regular program flow.
>>>> Then we only trap and return control to LLDB when the condition is true.
>>>> ## Implementation Details
>>>> To be able to evaluate a breakpoint condition without interacting with the
>>>> debugger, LLDB changes the inferior program execution flow by overwriting
>>>> the instruction at which the breakpoint was set with a branching 
>>>> instruction.
>>>> The original instruction(s) are copied to a memory stub allocated in the
>>>> inferior program memory called the __Fast Conditional Breakpoint 
>>>> Trampoline__
>>>> or __FCBT__. The FCBT will allow us the re-route the program execution 
>>>> flow to
>>>> check the condition in-process while preserving the original program 
>>>> behavior.
>>>> This part is critical to setup Fast Conditional Breakpoints.
>>>> ```
>>>>      Inferior Binary                                     Trampoline
>>>> |            .            |                      
>>>> +-------------------------+
>>>> |            .            |                      |                         
>>>> |
>>>> |            .            |           +--------->+   Save RegisterContext  
>>>> |
>>>> |            .            |           |          |                         
>>>> |
>>>> +-------------------------+           |          
>>>> +-------------------------+
>>>> |                         |           |          |                         
>>>> |
>>>> |       Instruction       |           |          |  Build Arguments Struct 
>>>> |
>>>> |                         |           |          |                         
>>>> |
>>>> +-------------------------+           |          
>>>> +-------------------------+
>>>> |                         +-----------+          |                         
>>>> |
>>>> |   Branch to Trampoline  |                      |  Call Condition Checker 
>>>> |
>>>> |                         +<----------+          |                         
>>>> |
>>>> +-------------------------+           |          
>>>> +-------------------------+
>>>> |                         |           |          |                         
>>>> |
>>>> |       Instruction       |           |          | Restore RegisterContext 
>>>> |
>>>> |                         |           |          |                         
>>>> |
>>>> +-------------------------+           |          
>>>> +-------------------------+
>>>> |            .            |           |          |                         
>>>> |
>>>> |            .            |           +----------+ Run Copied Instructions 
>>>> |
>>>> |            .            |                      |                         
>>>> |
>>>> |            .            |                      
>>>> +-------------------------+
>>>> ```
>>>> Once the execution reaches the Trampoline, several steps need to be taken.
>>>> LLDB relies on its UserExpressions to JIT these more complex conditional
>>>> expressions. However, since the execution will be handled by the debugged
>>>> program, LLDB will generate some code ahead-of-time in theTrampoline that
>>>> will allow the inferior to initialize the expression's argument structure.
>>>> Generating the condition checker as well as the code to initialize
>>>> the argument structure of each breakpoint hit is handled by
>>>> __BreakpointInjectedSite__ class, which builds the conditional expression 
>>>> for
>>>> all the BreakpointLocations, emits the `$__lldb_expr` function, and 
>>>> relocates
>>>> variables in the `$__lldb_arg` structure.
>>>> BreakpointInjectedSites are created in the __Process__ if the user enables
>>>> the `-I | --inject-condition` flag when setting or modifying a breakpoint.
>>>> Because the __FCBT__ is architecture specific, BreakpointInjectedSites will
>>>> only be available when a target has added support to it, in the matching
>>>> Architecture Plugin.
>>>> Several parts of lldb have to be modified to implement this feature:
>>>> - **Breakpoint**: Added BreakpointInjectedSite, and helper functions to the
>>>>                  related class (Breakpoint, BreakpointLocation,
>>>>                  BreakpointSite, BreakpointOptions)
>>>> - **Plugins**:    Added ObjectFileTrampoline for the unwinding
>>>>                  Added x86_64 ABI support (FCBT setup & safety checks)
>>>> - **Symbol**:     Changed `FuncUnwinders` and `UnwindPlan` to support FCBT
>>>> - **Target**:     Added BreakpointInjectedSite creation to `Process` to 
>>>> insert
>>>>                  the jump to the FCBT
>>>>                  Added the Trampoline module creation to `ABI` for the
>>>>                  unwinding
>>>> ### Breakpoint Option
>>>> Since Fast Conditional Breakpoints are still under development, they will 
>>>> not
>>>> be on by default, but rather we will provide a flag to 'breakpoint set" and
>>>> "breakpoint modify" to enable the feature. Note that the end-goal is to 
>>>> have
>>>> them as a default and only fallback to regular conditional breakpoints on
>>>> unsupported architectures.
>>>> They can be enabled when using `-I | --inject-condition` option. These 
>>>> options
>>>> can also be enabled using the Python Scripting Bridge public API, using the
>>>> `InjectCondition(bool enable)` method on an __SBBreakpoint__ or
>>>> __SBBreakpointLocation__ object.
>>>> This feature is intended to be used with condition expression
>>>> (`-c <expr> | --condition <expr>`), but also other conditions types such 
>>>> as:
>>>> - Thread ID (`-t <thread-id> | --thread-id <thread-id>`)
>>>> - Thread Index (`-x <thread-index> | --thread-index <thread-index>`)
>>>> - Thread Queue Name
>>>> ### Trampoline
>>>> To be able to inject the condition, we need to re-route the debugged 
>>>> program's
>>>> execution flow. This parts is handled in the __Trampoline__, a memory stub
>>>> allocated in the inferior that will contain the condition check while
>>>> preserving the program's original behavior.
>>>> The trampoline is architecture specific and built by lldb. To have the
>>>> condition evaluation work out-of-place, several steps need to be completed:
>>>> 1. Save all the registers by pushing them to the stack
>>>> 2. Build the `$__lldb_arg` structure by calling a injected UtilityFunction
>>>> 3. Check the condition by calling the injected UserExpression and execute a
>>>>   trap if the condition is true.
>>>> 4. Restore register context
>>>> 5. Rewrite and run original copied instructions operands
>>>> All the values needed for the steps can be computed ahead of time, when the
>>>> breakpoint is set (i.e: size of the allocation, jump address, relocation 
>>>> ...).
>>>> Since the x86_64 ISA has variable instruction size, LLDB moves enough
>>>> instructions in the trampoline to be able to overwrite them with a jump to 
>>>> the
>>>> trampoline. Also, the allocation region for the trampoline might be too far
>>>> away for a single jump, so we might need to have several branch island 
>>>> before
>>>> reaching the trampoline (WIP).
>>>> ### BreakpointInjectedSite
>>>> To handle the Fast Conditional Breakpoint setup, LLDB uses
>>>> __BreakpointInjectedSites__ which is a sub-class of the BreakpointSite 
>>>> class.
>>>> BreakpointInjectedSites uses different `UserExpression` to resolve 
>>>> variables
>>>> and inject the condition checker.
>>>> #### Condition Checker
>>>> Because a BreakpointSite can have multiple BreakpointLocations with 
>>>> different
>>>> conditions, LLDB need first iterate over each owner of the BreakpointSite 
>>>> and
>>>> gather all the conditions. If one of the BreakpointLocations doesn't have a
>>>> condition or the condition is not set to be injected, the
>>>> BreakpointInjectedSite will behave as a regular BreakpointSite.
>>>> Once all the conditions are fetched, LLDB will create a __UserExpression__
>>>> with the injected trap instruction.
>>>> When a trap is hit, LLDB uses the __BreakpointSiteList__, a map from a trap
>>>> address to a BreakpointSite to identify where to stop. To allow LLDB to 
>>>> catch
>>>> the injected trap at runtime, it will disassemble the compiled expression 
>>>> and
>>>> scan for the trap address. The injected trap address is then added to 
>>>> LLDB's
>>>> __BreakpointSiteList__.
>>>> When generated, this is what the condition checker looks like:
>>>> ```cpp
>>>> void $__lldb_expr(void *$__lldb_arg)
>>>> {
>>>>    /*lldb_BODY_START*/
>>>>    if (condition) {
>>>>        __builtin_debugtrap();
>>>>    };
>>>>    /*lldb_BODY_END*/
>>>> }
>>>> ```
>>>> #### Argument Builder
>>>> The conditional expression will often refer to local variables, and the
>>>> references to these variables need to be tied to the instances of them in 
>>>> the
>>>> current frame.
>>>> Usually the expression evaluator invokes the __Materializer__ which fetches
>>>> the variables values and fills the `$__lldb_arg` structure. But since we 
>>>> don't
>>>> want to switch contexts, LLDB has to resolve used variables by generating 
>>>> code
>>>> that will initialize the `$__lldb_arg` pointer, before running the 
>>>> condition
>>>> checker.
>>>> That's where the __Argument Builder__ comes in.
>>>> The argument builder uses an `UtilityFunction` to generate the
>>>> `$__lldb_create_args_struct` function. It is called by the Trampoline
>>>> before the condition checker, in order to resolve variables used in the
>>>> condition expression.
>>>> `$__lldb_create_args_struct` will fill the `$__lldb_arg` in several steps:
>>>> 1. It takes advantage of the fact that LLDB saved all the registers to the
>>>>   stack and map them in an `register_context` structure.
>>>>    ```cpp
>>>>    typedef struct {
>>>>    // General Purpose Registers
>>>>    } register_context;
>>>>    ```
>>>>    2. Using information from the variable resolver, it allocates a memory 
>>>> stub
>>>>   that will contain the used variable addresses.
>>>> 3. Then, it will use the register values and the collected metadata to
>>>>   compute the used variable address and write that into the
>>>>   newly allocated structure.
>>>> 4. Finally the allocated structure is returned to the trampoline, which 
>>>> will
>>>>   pass it as an argument to the injected condition checker.
>>> I am wondering whether we really need to involve the memory allocation 
>>> functions here. What's the size of this address structure? I would expect 
>>> it to be relatively small compared to the size of the entire register 
>>> context that we have just saved to the stack. If that's the case, the case 
>>> then maybe we could have the trampoline allocate some space on the stack 
>>> and pass that as an argument to the $__lldb_arg building code.
>> Allocating the $__lldb_arg struct in the stack is on my to-do list. This 
>> will change in the coming revisions.
>>>> Since `$__lldb_create_args_struct` uses the same JIT Engine as the
>>>> UserExpression, LLDB will parse, build and insert it in the program memory.
>>>> #### Variable Resolver
>>>> When creating a Fast Conditional Breakpoint, the __debug info__ tells us
>>>> where the used variables are located. Using this information and the saved
>>>> register context, we can generate code that will resolve the variables at
>>>> runtime (__Step 3 of the Argument Builder__).
>>>> LLDB will first get the `DeclMap` from the condition UserExpression and 
>>>> pull a
>>>> list of the used variables. While iterating on that list, LLDB extracts 
>>>> each
>>>> variable's __DWARF Expression__.
>>>> DWARF expressions explain how to reconstruct a variable's values using 
>>>> DWARF
>>>> operations.
>>>> The reason why LLDB needs the register context is because local variable 
>>>> are
>>>> often at an offset of the __Stack Base Pointer register__ or written across
>>>> one or multiple registers. This is why I've only focused on `DW_OP_fbreg`
>>>> expressions since I could get the offset of the variable and add it to the
>>>> base pointer register to get its address. The variable address, and other
>>>> metadata such as its size, its identifier and the DWARF Expression are 
>>>> saved
>>>> to an `ArgumentMetadata` vector that will be used by the `ArgumentBuilder`
>>>> to create the `$__lldb_arg` structure.
>>>> Since all the registers are already mapped to a structure, I should
>>>> be able to support more __DWARF Operations__ in the future.
>>>> After collecting some metrics on the __Clang__ binary, built at __-O0__,
>>>> the debug info shows that __99%__ of the most used DWARF Operations are :
>>>> |DWARF Operation|         Occurrences       |
>>>> |---------------|---------------------------|
>>>> |DW\_OP_fbreg   |         2 114 612         |
>>>> |DW\_OP_reg     |           820 548         |
>>>> |DW\_OP_constu  |           267 450         |
>>>> |DW\_OP_addr    |            17 370         |
>>>> |   __Top 4__   | __3 219 980 Occurrences__ |
>>>> |---------------|---------------------------|
>>>> |   __Total__   | __3 236 859 Occurrences__ |
>>>> Those 4 operations are the one that I'll support for now.
>>>> To support more complex expressions, we would need to JIT-compile
>>>> a DWARF expression interpreter.
>>>> ### Unwinders
>>>> When the program hits the injected trap instruction, the execution stops
>>>> inside the injected UserExpression.
>>>> ```cpp
>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
>>>>  * frame #0: 0x00000001001070b9 
>>>> $__lldb_expr`$__lldb_expr($__lldb_arg=0x00000001f5671000) at 
>>>> lldb-33192c.expr:49
>>>>    frame #1: 0x0000000100105028
>>>> ```
>>>> This part of the program should be transparent to user. To allow LLDB to
>>>> elide the condition checker and the FCBT frame, the Unwinder needs to be
>>>> able to identify all of the frames, up to the user's source code frame.
>>>> The injected UserExpression already has a valid stack frame, but it doesn't
>>>> have any information about its caller, the Trampoline. In order to unwind 
>>>> to
>>>> the user's code, LLDB needs symbolic information for the trampoline.
>>>> This information is tied to LLDB modules, created using an ObjectFile
>>>> representation, the __ObjectFileTrampoline__ in our case.
>>>> It will contain several pieces of information such as, the module's name 
>>>> and
>>>> description, but most importantly the module __Symbol Table__ that will 
>>>> have
>>>> the trampoline symbol (`$__lldb_injected_conditional_bp_trampoline `) and a
>>>> __Text Section__ that will tell the unwinder the trampoline bounds.
>>>> Then, LLDB inserts a __Function Unwinder__ in the module UnwindTable and
>>>> creates an __Unwind Plan__ pointing to the BreakpointLocation return 
>>>> address.
>>>> This is done taking into consideration that the trampoline will alter the
>>>> memory layout by spilling registers to the stack.
>>>> Finally, the newly created module is appended to the target image list, 
>>>> which
>>>> allows LLDB to move between the injected code and the user code seamlessly.
>>>> This is what the backtrace looks like after hitting the injected trap:
>>>> ```cpp
>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
>>>>    frame #0: 0x00000001001070b9 
>>>> $__lldb_expr`$__lldb_expr($__lldb_arg=0x00000001f4c71000) at 
>>>> lldb-ca98b7.expr:49
>>>>    frame #1: 0x0000000100105028 
>>>> $__lldb_injected_conditional_bp_trampoline`$__lldb_injected_conditional_bp_trampoline
>>>>  + 40
>>>>  * frame #2: 0x0000000100000f5b main`main at main.c:7:23
>>>> ```
>>>> For now, LLDB selects the user frame but the goal would be to mask all the
>>>> frames introduced by the Fast Conditional Breakpoint.
>>>> A `debug-injected-condition` setting will allow to stop at the FCBT and 
>>>> show
>>>> all the elided frames.
>>> Regarding unwinding, I am wondering whether we really need to do anything 
>>> really special. It sounds to me that if we try a little bit harder then we 
>>> could make the trampoline code look very much like a signal handler, and 
>>> have it be treated as such. Then the only special thing we would need to do 
>>> is to hide the topmost trampoline code somewhere higher up in the 
>>> presentation layer.
>>> I am imagining the trampoline code could look something like this (excuse 
>>> my bad assembly, I haven't written that in a while):
>>> pushq %rax
>>> pushq %rbx
>>> ...
>>> leaq $SIZE_OF_REGISTER_CONTEXT(%rsp), %r10 # void *registers
>>> movq %rsp, %r11 # void *args
>>> subq $SIZE_OF_ARGS, %rsp
>>> movq %r10, %rdi
>>> movq %r11, %rsi
>>> callq __build_args # __build_args(const void *registers, void *args)
>>> movq %r11, %rdi
>>> callq __lldb_expr # __lldb_expr(void *args)
>>> test %al, %al
>>> jz .Ldone
>>> trap_opcode:
>>> int3
>>> .Ldone:
>>> addq $SIZE_OF_ARGS, %rsp
>>> pop everything, execute displaced instructions and jump back
>>> I think this trampoline is pretty similar to what you're proposing, but 
>>> there are a couple of subtle differences:
>>> - the args structure is allocated on the stack - I already spoke about that
>>> - the testing of the condition happens inside the trampoline
>>> I think this second item has several advantages. Firstly, this means that 
>>> we hit the breakpoint, we only have one extra frame on the stack. So even 
>>> if we don't do any extra work in the debugger to hide this stuff, we don't 
>>> clutter the stack too much.
>>> Secondly, this means we can avoid the "dissasemble and scan for trap 
>>> opcode" step, which is kind of a hack -- after all, we generated these 
>>> instructions, so we should _know_ where the trap opcode is. This way, you 
>>> can emit a special symbol (trap_opcode label in the example above), that 
>>> lldb can then search for, and know it's location exactly.
>> I think testing the condition inside the trampoline might be very limiting:
>> - The variable resolution would be need to be rethought to allow the 
>> condition check to happen in the trampoline.
>> - To be able to support different condition types (expression / thread name 
>> / thread id …), the $__lldb_expr is a better option IMO. In the future, we 
>> might also inject logging code that would only be run according to the 
>> condition.
>> - This feature requires at least one more frame (for your approach), that 
>> would still need to be hidden to the user. I don’t think hiding 2 frames is 
>> more work than hiding 1.
> I might be the one misunderstanding, but I think you missed Pavel’s point. In 
> Pavel’s model, you still JIT the condition into __llldb_expr and pas it the 
> argument structure. The difference is that you don’t have the trap inside of 
> the JITed code, you have the JITed code return whether to stop or not and 
> have the trampoline hit the trap depending in the return value. I agree this 
> seems cleaner than scanning the output to find the trap.

Inserting the trap in the trampoline would still require to fetch the 
$__lldb_expr's return value (architecture-specific) and write an assembly check 
statement (compare and jump).
Right now, all of this is abstracted by the UserExpression.

I do agree that it’s cleaner, and will take it into consideration for my next 

> Fred   
>>> And lastly, and this is the most important advantage IMO, is that we are in 
>>> full control of the kind of unwind info we generate for the trampoline. We 
>>> can generate the proper eh_frame info for this trampoline which would 
>>> correctly describe the locations of the registers of the previous frame, so 
>>> that lldb would automatically be able to find them and display them 
>>> properly when you do for instance "register read" with the parent frame 
>>> selected. Hopefully, all this would take is a couple of well-placed .cfi 
>>> assembler instructions.
>>> Here, I'm imagining we could use the MC layer in llvm do do this thing, 
>>> either by feeding it a raw assembler string, or by using it's c++ api, 
>>> whichever is easier. Then we could feed this to the normal jit together 
>>> with the compiled c++ expression and it would link it all together and load 
>>> it into memory.
>>>> ### Instruction Shifter (WIP)
>>>> Because some instructions might use operands that are at an offsets 
>>>> relative
>>>> to the program counter, copying the instructions to a new location might
>>>> change their meaning:
>>>> LLDB needs to patch each instruction with the right offset.
>>>> This is done using `LLVM::MCInst` tool in order to detect the instructions
>>>> that need to be rewritten.
>>>> ## Risk Mitigation
>>>> The optimization relies heavily on code injection, most of which is
>>>> architecture specific. Because of this, overwriting the instructions
>>>> can fail depending of the breakpoint location, e.g.:
>>>> - If the overwritten instructions contains indirection (branch 
>>>> instructions).
>>>> - If the overwritten instructions are a branch target.
>>>> - If there is not enough instructions to insert the branch instruction 
>>>> (x86_64)
>>>> If the setup process fails to insert the Fast Conditional Breakpoint, it 
>>>> will
>>>> fallback to the legacy behavior, and warn the user about what went wrong.
>>> Another possible fallback behavior would be to still do the whole 
>>> trampoline stuff and everything, but avoid needing to overwrite opcodes in 
>>> the target by having the gdb stub do this work for us. So, we could teach 
>>> the stub that some addresses are special and when a breakpoint at this 
>>> location gets hit, it should automatically change the program counter to 
>>> some other location (the address of our trampoline) and let the program 
>>> continue. This way, you would only need to insert a single trap 
>>> instruction, which is what we know how to do already. And I believe this 
>>> would still bring a major speedup compared to the current implementation 
>>> (particularly if the target is remote on a high-latency link, but even in 
>>> the case of local debugging, I would expect maybe an order of magnitude 
>>> faster processing of conditional breakpoints).
>>> This would be kind of similar to the "cond_list" in the gdb-remote 
>>> "Z0;addr,kind;cond_list" packet 
>>> <https://sourceware.org/gdb/onlinedocs/gdb/Packets.html>.
>>> In fact, given that this "instruction shifting" is the most unpredictable 
>>> part of this whole architecture (because we don't control the contents of 
>>> the inferior instructions), it might make sense to do this approach first, 
>>> and then do the instruction shifting as a follow-up.
>>>> One way to mitigate those limitations would be to use code instrumentation
>>>> to detect if it's safe to set a Fast Condition Breakpoint at a certain
>>>> location, and hint the user to move the FCB before or after the location 
>>>> where
>>>> it was set originally.
>>>> ## Prototype Code
>>>> I submitted my patches ([1](reviews.llvm.org/D66248), 
>>>> [2](reviews.llvm.org/D66249),
>>>> [3](reviews.llvm.org/D66250)) on Phabricator with the prototype.
>>>> ## Feedback
>>>> Before moving forward I'd like to get the community's input. What do you
>>>> think about this approach? Any feedback would be greatly appreciated!
>>>> Thanks,
>>> As my last suggestion, I would like to ask you to consider testing as 
>>> you're writing this code. This is a pretty complex machinery you're 
>>> building, and it would be nice if it was possible to test pieces of it in 
>>> isolation instead of just the large end-to-end kinds of tests. For example, 
>>> in the "instruction shifter" machinery, it would be nice to be able to take 
>>> a single instruction, execute both in place, and in a "shifted" location, 
>>> and assert that the resulting register contents are identical.
>> Will do.
>>> regards,
>>> pavel
>> Thanks,
>> Ismail.
