Re: long I/O delays when strace is running

Daniel Santos Sun, 23 Apr 2017 15:46:33 -0700

Unfortunately, I don't have much time to spend on this issue as the gcc8stage1 has started and I have a few more issues to clear up with mypatchset.


On 04/23/2017 02:42 AM, Mark Geisert wrote:

Anyway, I can see that the strace process's shared _pinfo object isnever fully
populated:
_pinfo 0x30000 {
   pid 2800,
   process_state 0x00000001,
   ppid 0,
   exitcode 0
   cygstarted 0,
   dwProcessId 0x00000AF0,
   progname "D:\cygwin64\bin\strace.exe",
   uid 0,
   gid 0,
   pgid 0,
   sid 0,
   ctty 0,
   has_pgid_children 0
   start_time 1492881370,
   nice 0,
   stopsig 0,
   sendsig 0x0,
   exec_sendsig 0x0,
   exec_dwProcessId 0
}
Again, strace.exe is a Windows executable, so perhaps some of thosefields don't make sense for a non-Cygwin process and are notinitialized? Purely speculation on my part.

Oh, I understand now, thanks. :) So it doesn't link to cygwin1.dll (orany other cygwin libs), that makes sense. So the flaw is probablythinking that this executable *should* have uid, guid, ppid, etc. Yet,it exists in the cygwin process database (apparently a bunch of shared(probably anonymous) files?). So the mistake is either listing it inthe database or not accounting for the possibility of strace, thesemi-cygwin program? Maybe there should be (or is?) a flag to tellreaders of the cygwin process database that this is a "special case"process?

So I would venture to say that is a problem.  Also, pinfo::init() should
probably issure some error message if it waits 2-ish seconds and thestruct
still isn't correctly populated.
That seems right. I unfortunately don't know why the code presumesthe struct is always populated within a certain (small) amount oftime. The complaint comment about minimum possible sleep durationsure makes it seem like it's always supposed to be populated veryquickly.

Yes, and not knowing cygwin's architecture it's hard for me to guesswhy, although I can do a git blame and try to understand when the codewas put in. Also, anything like this usually screams race condition inmy ear, but I can't say that w/o really understanding it well and whatassumptions are being made. For instance, if another thread/processcould really modify this then reads should be done using known atomicinstructions. On 32-bit x86, iirc, a mov of the machine word size isalways atomic, i.e., either you get an intact old value or you get anintact new value, you should never get two bytes of the new value andtwo bytes of the old. But when I'm writing C code, I never want topresume what the compiler will emit for situations like this and it'sbetter to use some atomic read/write macro/inline, even though I can'treally imagine this particular snippet not using a simple mov, using anexplicit "atomic" function/macro conveys the intention.

I should note that in the case of trying to analyze this problem withexpect, I allowed a make -kj8 check to run for a few days (should takemaybe 4 hours) and I never had the race condition. Presumably, if Iallowed it to run for a very long period or time (months or years) itwould have likely occurred.

Is there a way to debug the children of strace? It would make it alot easier.That's part of why I wrote the _pinfo::debug(), but also when I debugstracewith gdb, the _pinfo struct IS properly populated. The best problemsare the
ones that disappear when you try to debug them.
strace always acts as the debugger of the target process you startstrace with (or attach to; see '-p' in strace's help). strace has aswitch '-f' == '--trace-children' that defaults to being ON. So bydefault strace is getting DEBUG_EVENTs from the target strace launchedas well as any process the target creates.
If you explicitly set the '-f' flag, you're actually turning OFF thatdefault and *only* the target process sends DEBUG_EVENTs. In thatcase any process the target creates will be invisible to strace. Youcould conceivably debug those sub-processes with gdb but you likelywon't be able to catch them at their startup unless they wait for yourattach.

Very interesting! Is it possible to have two processes debugging andhave strace forward debug events that it isn't interested in to anotherdebugger in the chain? I'm probably just talking crazy here. Eitherway, that's ancillary to fixing the problem.


Daniel

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Re: long I/O delays when strace is running

Reply via email to