On 15/08/2019 20:15, Jim Ingham wrote:
Thanks for your great comments.  A few replies...

On Aug 15, 2019, at 10:10 AM, Pavel Labath via lldb-dev 
<lldb-dev@lists.llvm.org> wrote:
I am wondering whether we really need to involve the memory allocation 
functions here. What's the size of this address structure? I would expect it to 
be relatively small compared to the size of the entire register context that we 
have just saved to the stack. If that's the case, the case then maybe we could 
have the trampoline allocate some space on the stack and pass that as an 
argument to the $__lldb_arg building code.

You have no guarantee that only one thread is running this code at any given 
time.  So you would have to put a mutex in the condition to guard the use of 
this stack allocation.  That's not impossible but it means you're changing 
threading behavior.  Calling the system allocator might take a lock but a lot 
of allocation systems can hand out small allocations without locking, so it 
might be simpler to just take advantage of that.

I am sorry, but I am confused. I am suggesting we take a slice of the stack from the thread that happened to hit that breakpoint, and use that memory for the __lldb_arg structure for the purpose of evaluating the condition on that very thread. If two threads hit the breakpoint simultaneously, then we just allocate two chunks of memory on their respective stacks. Or am I misunderstanding something about how this structure is supposed to be used?



Another possible fallback behavior would be to still do the whole trampoline 
stuff and everything, but avoid needing to overwrite opcodes in the target by 
having the gdb stub do this work for us. So, we could teach the stub that some 
addresses are special and when a breakpoint at this location gets hit, it 
should automatically change the program counter to some other location (the 
address of our trampoline) and let the program continue. This way, you would 
only need to insert a single trap instruction, which is what we know how to do 
already. And I believe this would still bring a major speedup compared to the 
current implementation (particularly if the target is remote on a high-latency 
link, but even in the case of local debugging, I would expect maybe an order of 
magnitude faster processing of conditional breakpoints).

This is a clever idea.  It would also mean that you wouldn't have to figure out 
how to do register saves and restores in code, since debugserver already knows 
how to do that, and once you are stopped it is probably not much slower to have 
debugserver do that job than have the trampoline do it.  It also has the 
advantage that you don't need to deal with the problem where the space that you 
are able to allocate for the trampoline code is too far away from the code you 
are patching for a simple jump.  It would certainly be worth seeing how much 
faster this makes conditions.

I actually thought we would use the exact same trampoline that would be used for the full solution (so it would do the register saves, restores, etc), and the stub would only help us to avoid trampling over a long sequence of instructions. But other solutions are certainly possible too...


Unless I'm missing something you would still need two traps. One in the main instruction stream and one to stop when the condition is true. But maybe you meant "a single kind of insertion - a trap" not "a single trap instruction"

I meant "a single in the application's instruction stream". The counts of traps in the code that we generate aren't that important, as we can do what we want there. But if we insert just a single trap opcode, then we are guaranteed to overwrite only one instruction, which means the whole "are we overwriting a jump target" discussion becomes moot. OTOH, if we write a full jump code then we can overwrite a *lot* of instructions -- the shortest sequence that can jump anywhere in the address space I can think of is something like pushq %rax; movabsq $WHATEVER, %rax; jmpq *%rax. Something as big as that is fairly likely to overwrite a jump target.


...


This would be kind of similar to the "cond_list" in the gdb-remote 
"Z0;addr,kind;cond_list" packet <https://sourceware.org/gdb/onlinedocs/gdb/Packets.html>.

In fact, given that this "instruction shifting" is the most unpredictable part 
of this whole architecture (because we don't control the contents of the inferior 
instructions), it might make sense to do this approach first, and then do the instruction 
shifting as a follow-up.

One side-benefit we are trying to get out of the instruction shifting approach is not 
having to stop all threads when inserting breakpoints as often as possible.  Since we can 
inject thread ID tests into the condition as well, doing the instruction shifting would 
mean you could specify thread-specific breakpoints, and then ONLY the threads that match 
the thread specification would ever have to be stopped.  You could also have negative 
tests so that you could specify "no stop" threads.  So I still think it is 
worthwhile pursuing the full implementation Ismail outlined in the long run.

No argument there. I'm am just proposing this as a stepping stone towards the final goal.

Interestingly, this is one of the places where the otherwise annoying linux ptrace behavior may come in really handy. Since a thread hitting a breakpoint does not automatically stop all other threads in the process (we have to manually stop all of them ourselves), the lldb-server could do the trampoline stuff without of the other threads in the process noticing anything.

pl
_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Reply via email to