Thanks for the info… We’re running in sync mode, so for Continue to return before the process is all the way stopped, Process::ResumeSynchronous() must be bobbling the case where a stop event that resumes the process comes in. The only way I can see that happening is if Process::PrivateResume can return before setting the private state to eStateRunning. But only when something gets a chance to time out somewhere along the line.
I’ve been looking but I haven’t found anything relevant along this code path. There is one hard-coded timeout along this path, the 5 second wait between sending the eBroadcastBitAsyncContinue with the continue packet to the gdb-remote async thread and receiving the ack back (in ProcessGDBRemote::DoResume). But if that timed out it would write a “gdb-remote process” log message: "Resume timed out”. The process log is on, I see other output from it, but I don’t see that message in the failure transcript. Weird. Jim > On Oct 1, 2020, at 5:26 AM, Raphael “Teemperor” Isemann <teempe...@gmail.com> > wrote: > > +1, I have two machines with very similar setup where only the one that is > under heavy load sees the test failures. > > - Raphael > >> On 1 Oct 2020, at 14:24, Pavel Labath <pa...@labath.sk> wrote: >> >> On 30/09/2020 23:21, Jim Ingham wrote: >>> The test doesn’t seem to be flakey in the “run it a bunch of times and >>> it will eventually fail” type flakey. I ran the test 200 times on my >>> machine and didn’t get a failure. >> >> Actually, it seems like exactly the typical kind of flaky test to me -- >> it mostly works when run on its own, but starts failing as soon as the >> system comes under load. >> >> It didn't fail for me either for over 100 iterations. However, as soon >> as I cranked up the cpu load (compiling llvm is good at that), it failed >> almost immediately. >> >> It also doesn't seem to be related to the way the stop hook resumes the >> process. >> <http://lab.llvm.org:8011/builders/lldb-aarch64-ubuntu/builds/9516/steps/test/logs/stdio> >> is one example where the auto_continue version of the test fails, and I >> have seen both tests fail on my machine. >> >> I have some traces of failing and successful runs of the test (will send >> them to you in a private email). I didn't dive too deeply, but the >> problem does not seem to be related to python stop hooks. It looks more >> like a general stop hook bug, which we've had problems with in the past. >> >> The problems seems to be that the process.Continue() on the main thread >> returns early, and so the subsequent checks (for the topmost frame etc.) >> execute concurrently with the "step out" action. In the "Failure" file >> I'll send you you can see that (line 9222) SBFrame::GetFunctionName is >> called before the inferior process stops in the main function (the >> processing of that happens immediately after that line, on the >> "intern-state" thread). >> >> pl > _______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits