labath added a comment.

In D83446#2142030 <https://reviews.llvm.org/D83446#2142030>, @JDevlieghere 
wrote:

> In D83446#2141713 <https://reviews.llvm.org/D83446#2141713>, @labath wrote:
>
> > I'm wondering if this problem does not go beyond reproducers... The fact 
> > that we can have two threads running in parallel, doing stuff, without much 
> > synchronization seems like a bad idea in general. In theory, something 
> > similar could happen during normal operation as well, if the user was fast 
> > enough to type "cont<Return>" before the stop reason is printed. In the 
> > non-reproducer mode we wouldn't crash with the unexpected packet assertion, 
> > but we would probably do something unexpected nonetheless. Of course, a 
> > real human user won't be fast enough to do that, but I wouldn't rule out 
> > this being the cause of the flakiness of some of our pexpect tests. And I'm 
> > not sure what happens when sourcing command files, which is fairly similar 
> > to running a reproducer. How would the events be processed there?
>
>
> Agreed. I believe at some point (long before I started working on LLDB) all 
> the GDB communication was limited to a single thread, but the overhead of 
> having to talk and synchronize with that thread was too costly and it got 
> relaxed.


I don't believe that is relevant here. We still have a gdb lock, which ensures 
that at most one thread is communicating with the server at any given time, 
even though it's not always the same thread. But that lock is far too granular 
-- it works at the packet level. What we need here is a way to ensure that 
process of fetching the stop reason (as a whole) does not interleave with the 
subsequent command.

> 
> 
>> So, I'm wondering if we shouldn't add some form of synchronization here, 
>> which would be independent of the reproducer mode we're running in. That 
>> would solve the reproducer problem as a consequence, but it seems like it 
>> would be generally useful. One way to do that could be to send some sort of 
>> a ping event (EventDataReceipt ?) to the event handler thread before or 
>> after running a command, and wait for it to report back. That should ensure 
>> that all events produced by the previous command have been fully processed...
> 
> How would this be different from synchronous mode and how would you make this 
> work in asynchronous mode?

The way I see it, this is mostly orthogonal to the (a)synchronous mode issue. I 
consider the problem to be in the `gdb-remote` command, which behaves the same 
way in both modes. It causes a eBroadcastBitStateChanged/eStateStopped to be 
sent, and in response to that the default event handler prints the stop reason 
string. Normally this happens very quickly. So quickly, that one would be 
forgiven to consider the stop reason string to be the "output" of the 
gdb-remote command. My proposal is to not consider this command to be 
"complete" until that event is processed, essentially making that string _be_ 
the output of the command.

For the asynchronous mode, I do think that we will need some form of a global 
"clock". I'm just not sure that this is the right form for it. To me it seems 
that this is mostly about the relationship of the command interpreter and the 
gdb-remote protocol -- if I do an asynchronous "process continue", followed (10 
seconds later) by an "process interrupt", the gdb replayer needs to know that 
it should not respond to the $c packet, but wait for the ^C interrupt packet 
which will come from the interrupt command.

Maybe you could try explaining how you wanted to make this work for 
asyncrhonous mode. (Btw, I think the example above may not even need a global 
clock -- the gdb replayer can just note that the $c packet should not be 
replied to, and then send the stop-reply as a response to the ^C packet.)

> As an aside: the first thing I considered here was making this particular 
> command //synchronous// by blocking until we get the stop event.

Now, here I get a feeling that one of us is misunderstanding the (a)synchronous 
concept. The way I remember this is that when we want to do a synchronous 
command, we wait for the Stopped event by injecting a custom listener between 
the source, and the final listener. This custom listener fetches the event, 
does something, and then broadcasts it to the final listener. This ensures that 
the process is stopped and that most of the work that should happen when it 
stops has happened, but it does _not_ synchronize with the processing done by 
the final listener (which is what we want here).

> During replay the execution is always synchronous (which is something I don't 
> like and was hoping to get rid off with this patch). That would only solve 
> the problem for this command and adds a bunch of assumptions about 
> connections resulting in stops. I know this is always true for gdb platforms, 
> but maybe not for other platforms.

Actually, I wanted this to be fully general -- after the execution of _any_ 
command, we make sure to drain _all_ events from the default event handler 
queue. This handling can still be done in the Debugger object, all the command 
interpreter would need to do is call `debugger->DrainEventHandler()` at an 
appropriate time.

> Also currently the debugger is the only one that deals with these events, I'm 
> not super excited to have that logic spread out with some commands handling 
> it synchronously and others doing it asynchronously.



> Based on your description I don't yet see how the `EventDataReceipt` thingy 
> would work. If we block from the moment the event is posted, then what's the 
> goal of having the event handling logic running in a separate thread? If we 
> block between when the event is being handled until it finished, then how 
> would you avoid a race between when the event is broadcast and when it's 
> being handled? Can you elaborate a bit on this?

For this particular case, I don't think there's an advantage (and there 
certainly are disadvantages) to doing this in a separate thread. IIUC (I also 
wasn't present at that time), this event handler thread exists so that we can 
respond to changes done through the SB API. So, if the something does the SB 
equivalent of "gdb-remote" or "frame select" or whatever, we still can print a 
message to the console that the state of the process has changed. Reusing this 
mechanism for the command line kinda makes sense, since we want things to also 
work the other way around -- have the IDE respond to process changes which are 
being done through the command line.

The way I was imagining this working is that the `DrainEventHandler` function 
would send a special event to the default event handler thread, and wait until 
it gets processed. Since the events are processed in FIFO order, this would 
ensure that all preceding events have been processed as well. Note that this 
would go to the default event handler even if the process events happen to be 
going to a different listener (e.g. one provided by the IDE). In that case the 
default event handler would just return straight away -- which is fine, because 
in this case the stop reason would not get printed.

That said, the talk of SB API has reminded me that a similar race can exist 
between the SB commands and the event handler thread. Given that the SB 
replayer also streams SB commands as fast as it can, the same race can occur if 
the user does the SB equivalents of the above command line commands (this is a 
problem with both solutions).

Now obviously, we don't want to drain the event queue after every SB call. I'm 
not exactly sure what that means though...


Repository:
  rLLDB LLDB

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83446/new/

https://reviews.llvm.org/D83446



_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to