I have a question about the handling of SendContinuePacketAndWaitForResponse 
when an attempt to interrupt the process times out.

One oddity I noticed is in the handling of the eStateInvalid return from 
SendContinuePacketAndWaitForResponse (in ProcessGDBRemote::AsyncThread).  When 
there's an async interrupt request in flight and the ReadPacket in SBPAWFR 
wakes up from its periodic timeout and it's been "too long" with nothing from 
debugserver, SBPAWFR returns eStateInvalid.  The code handling that in the 
AsyncThread does:

              case eStateInvalid: {
                // Check to see if we were trying to attach and if we got back
                // the "E87" error code from debugserver -- this indicates that
                // the process is not debuggable.  Return a slightly more
                // helpful error message about why the attach failed.
                if (::strstr(continue_cstr, "vAttach") != nullptr &&
                    response.GetError() == 0x87) {
                  process->SetExitStatus(-1, "cannot attach to process due to "
                                             "System Integrity Protection");
                } else if (::strstr(continue_cstr, "vAttach") != nullptr &&
                           response.GetStatus().Fail()) {
                  process->SetExitStatus(-1, response.GetStatus().AsCString());
                } else {
                  process->SetExitStatus(-1, "lost connection");
                }
                break;
              }

So when an interrupt fails, we immediately give up on the process, setting the 
exit status to -1.  

Note, however, that this branch of the if doesn't set "done" to true.  The 
switch is inside a 'while (done)', so after setting the state to eStateInvalid, 
we go back to wait for another packet from the debug stub.  

SetExitStatus calls SetPrivateState(eStateExited), which will produce another 
event, and normally we'll handle that and shut down nicely.  But by returning 
to fetch another packet from debugserver, we're leaving ourselves open to 
getting a delayed stop we are no longer set up to deal with.

The background is that there's a longstanding but infrequent crash in lldb, 
which  I'm trying to fix.  I've never been able to reproduce it locally, so I 
only have the end state.  

The crashing thread is the internal-state thread.  It is crashing because it 
was handling a stop event, and went to ask the thread what to do, and the 
ThreadPlanStack is corrupt in some form or other.  The symptoms vary, but they 
all violate basic invariants in the ThreadPlanStack (like it will crash because 
it has no elements, which is never true).  At the time of the crash, there's 
always another thread which is some ways past the "wait for the process to 
stop" part of the Destroy operation on the process, and has begun tearing down 
Process objects.  That's pretty clearly why the ThreadPlanStack is in a bad 
state...

So far as I can see, the only way that could happen is if the attempt to 
interrupt timed out, but we didn't stop listening for packets, and didn't 
immediately shut down the internal state thread.  So then debugserver woke up, 
and sent the stop packet, and the internal state thread tried to process it, 
but we were already in mid Destroy, and that went poorly...

The fact that we go back for another packet after getting eStateInvalid seems 
to be what allows this to happen.

lldb has lacked the "done = true;" that I would have expected here since at 
least the great reformatting, so I don't have any history on this?  

I added the "done = true;" and ran the testsuite, and there weren't any new 
failures.  But this code is pretty complex, so there might be some reason for 
waiting to see if the stub returns another packet after the interrupt fails.  
But I can't think of one.

Can anybody else think of a reason why not to set done to true here?

Jim



_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Reply via email to