On 15/08/2019 20:15, Jim Ingham wrote:
Thanks for your great comments. A few replies...
On Aug 15, 2019, at 10:10 AM, Pavel Labath via lldb-dev
<lldb-dev@lists.llvm.org> wrote:
I am wondering whether we really need to involve the memory allocation
functions here. What's the size of this address structure? I would expect it to
be relatively small compared to the size of the entire register context that we
have just saved to the stack. If that's the case, the case then maybe we could
have the trampoline allocate some space on the stack and pass that as an
argument to the $__lldb_arg building code.
You have no guarantee that only one thread is running this code at any given
time. So you would have to put a mutex in the condition to guard the use of
this stack allocation. That's not impossible but it means you're changing
threading behavior. Calling the system allocator might take a lock but a lot
of allocation systems can hand out small allocations without locking, so it
might be simpler to just take advantage of that.
I am sorry, but I am confused. I am suggesting we take a slice of the
stack from the thread that happened to hit that breakpoint, and use that
memory for the __lldb_arg structure for the purpose of evaluating the
condition on that very thread. If two threads hit the breakpoint
simultaneously, then we just allocate two chunks of memory on their
respective stacks. Or am I misunderstanding something about how this
structure is supposed to be used?
Another possible fallback behavior would be to still do the whole trampoline
stuff and everything, but avoid needing to overwrite opcodes in the target by
having the gdb stub do this work for us. So, we could teach the stub that some
addresses are special and when a breakpoint at this location gets hit, it
should automatically change the program counter to some other location (the
address of our trampoline) and let the program continue. This way, you would
only need to insert a single trap instruction, which is what we know how to do
already. And I believe this would still bring a major speedup compared to the
current implementation (particularly if the target is remote on a high-latency
link, but even in the case of local debugging, I would expect maybe an order of
magnitude faster processing of conditional breakpoints).
This is a clever idea. It would also mean that you wouldn't have to figure out
how to do register saves and restores in code, since debugserver already knows
how to do that, and once you are stopped it is probably not much slower to have
debugserver do that job than have the trampoline do it. It also has the
advantage that you don't need to deal with the problem where the space that you
are able to allocate for the trampoline code is too far away from the code you
are patching for a simple jump. It would certainly be worth seeing how much
faster this makes conditions.
I actually thought we would use the exact same trampoline that would be
used for the full solution (so it would do the register saves, restores,
etc), and the stub would only help us to avoid trampling over a long
sequence of instructions. But other solutions are certainly possible too...
Unless I'm missing something you would still need two traps. One in the main instruction stream and one to stop when the condition is true. But maybe you meant "a single kind of insertion - a trap" not "a single trap instruction"
I meant "a single in the application's instruction stream". The counts
of traps in the code that we generate aren't that important, as we can
do what we want there. But if we insert just a single trap opcode, then
we are guaranteed to overwrite only one instruction, which means the
whole "are we overwriting a jump target" discussion becomes moot. OTOH,
if we write a full jump code then we can overwrite a *lot* of
instructions -- the shortest sequence that can jump anywhere in the
address space I can think of is something like pushq %rax; movabsq
$WHATEVER, %rax; jmpq *%rax. Something as big as that is fairly likely
to overwrite a jump target.
...
This would be kind of similar to the "cond_list" in the gdb-remote
"Z0;addr,kind;cond_list" packet <https://sourceware.org/gdb/onlinedocs/gdb/Packets.html>.
In fact, given that this "instruction shifting" is the most unpredictable part
of this whole architecture (because we don't control the contents of the inferior
instructions), it might make sense to do this approach first, and then do the instruction
shifting as a follow-up.
One side-benefit we are trying to get out of the instruction shifting approach is not
having to stop all threads when inserting breakpoints as often as possible. Since we can
inject thread ID tests into the condition as well, doing the instruction shifting would
mean you could specify thread-specific breakpoints, and then ONLY the threads that match
the thread specification would ever have to be stopped. You could also have negative
tests so that you could specify "no stop" threads. So I still think it is
worthwhile pursuing the full implementation Ismail outlined in the long run.
No argument there. I'm am just proposing this as a stepping stone
towards the final goal.
Interestingly, this is one of the places where the otherwise annoying
linux ptrace behavior may come in really handy. Since a thread hitting a
breakpoint does not automatically stop all other threads in the process
(we have to manually stop all of them ourselves), the lldb-server could
do the trampoline stuff without of the other threads in the process
noticing anything.
pl
_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev