On Thu, Mar 04, 2004 at 10:59:48AM -0500, Christopher Faylor wrote: >On Wed, Mar 03, 2004 at 09:14:28PM -0500, Christopher Faylor wrote: >>On Wed, Mar 03, 2004 at 06:16:55PM -0500, Rolf Campbell wrote: >>>Christopher Faylor wrote: >>>>>>No, but I'll try to catch one. (I removed the strace from my script.) >>>>> >>>>>Ok, caught two already. (Produced with attached script + Makefile) >>>> >>>>Not much to there, unfortunately. >>>> >>>>Out of curiousity, can you duplicate this problem with the snapshot? I >>>>see that this is your own build, probably built with >>>>--enable-debugging. >>>> >>>>I've been diligently testing things with the snapshot rather than my >>>>own build because I was trying to debug what was in the subject. >>>>Snapshots aren't built with --enable-debugging. If this is just an >>>>artifact from building with --enable-debugging, then I'm not too >>>>worried. >>> >>>Ok, I've been running the script with the '25 snapshot all day, with 44 >>>failures. All the same type of failures I was seeing with the cvs >>>(with --enable-debugging). Unfortunitely, the ethernet card on my home >>>machine broke so for now I'll upload one of the strace files to a >>>geocites site. Nothing looks suspicious to me in the strace, maybe >>>it's a bug in make? http://www.geocities.com/endlisnis/Temp/freeze.zip >> >>Thanks. Unfortunately, I don't see anything more here than in the other >>strace output. >> >>I did manage to duplicate this after 1437 repetitions or so. My strace >>didn't show anything either, unfortunately, but now maybe I can slowly >>get to the bottom of the problem. > >Weird. Now that I've managed to duplicate it, I can do so at will. I >guess that's good news. > >I see what is causing the symptom but not what is causing the problem. >I spent a sleepless night modelling multi-threaded signal interrupts >in my head but I'm still not any closer to understanding the problem. > >The problem is that malloc allocates some memory, puts the address of >the memory in the eax register, and then returns. In the meantime, two >signals have come in, so rather than return immediately, malloc returns >to the signal handler and then the signal handler is called again. In >some cases, this causes the eax register to become zero and so make >(rightly) complains. In theory, this shouldn't happen since the eax >register should have been saved on the stack. > >Nope. Typing an explanation doesn't help me figure this out. Bummer.
I think I may have figured this out. It wasn't the eax register being zeroed. It was actually the test for zero returning improper values due to being interrupted by a signal. I made a fix last night that allowed me to run this for 2500+ iterations. Of course, I have managed to do that before without error, so that doesn't mean much, I guess. Backing the change out resulted in a 'virtual memory exhausted' error in less than a hundred iterations, however. Odd that I can duplicate it so readily now. I think my computer was previously trying to shield me from the pain of debugging this problem. There is a new snapshot up now with my fix in it. Please try it. cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/