On Monday 16 April 2007 13:27, Joachim Schipper wrote: > On Mon, Apr 16, 2007 at 12:33:09PM -0700, J.C. Roberts wrote: > > On Monday 16 April 2007 12:06, Maurice Janssen wrote: > > > On Monday, April 16, 2007 at 11:30:29 -0700, Bryan Vyhmeister wrote: > > > >On Apr 16, 2007, at 10:39 AM, J.C. Roberts wrote: > > > >>I've never seen the "alpha bug" on my DS20L (equivalent to the > > > >>CS20) or > > > >>my 500/500 but I have seen it on my PC* boxes. Other people > > > >> have had the exact opposite experience. The only time I've hit > > > >> the bug was during system builds and in contrast, others have > > > >> reported hitting the bug at other times during normal > > > >> operation. -- The trouble is, when you have a strange > > > >> "mystery bug" floating out there, it may or may not be > > > >> correctly blamed for any and all problems. > > > > > > > >Thank you for the followup. I guess I will just try and see what > > > >happens. I should dig out my PC164 whatever box and see if it > > > >exhibits the issue. > > > > > > FWIW: the bug seems to occur at my 3000/300X, but only during > > > heavy load like 'make build'. I never finished such a build, but > > > I only tried a few times. > > > > > > Maurice > > > > I just thought of something which might be worth a try on systems > > that show the bug during system builds; use nice(1) to lower the > > build priority. It's a long shot, and I haven't tried it, but it > > *might* be a useful work around. Then again, it might be a waste of > > time. > > Just curious: why do you think this helps? It's not like nice'ing the > only process on the box that uses any real resources helps, does it? > > Joachim
As I tried to say, I'm *really* unsure if it would help in any way but in theory, it might, so it seems worth a shot. As I said earlier, some suspect the "alpha bug" is some type of race condition but there is no proof to the assumption. There are many different types of race conditions, but in short they are all some form of timing issue. Often the race condition exist as a timing issue *between* two or more threads/processes. When you mess with the priority of a process (and everything it spawns), you change how much time each gets on the processor, and hence, change the overall timing dynamics. If the race is in a linear code path, changing the priority/timing will do nothing for you and you'll just hit the same bug, albeit more slowly. On the other hand, if the race is between processes/threads, interesting things can happen when you change the overall timing dynamics. (disk timing, cache timing/hits/misses/, memory timing, ...) Jacking the priority one way or another on different chunks of code, or even all of it, is one of the ways you can isolate a race problem through trial and error. Is it a fool proof method? -Nope. Can it be a complete waste of time? -Yep. Is it an indication that you're totally desperate? -Yes, most certainly. The trouble is, most race conditions I've seen (*cough* written) are inconsistent, and you get drop kicked into some random place. In contrast, the "alpha bug" always used to drop me to the same exact place when doing system builds (kernel builds to be exact). It was so consistent on the alpha machine I donated to the SBCL project that I considered putting shell code in the source file. ;-) -jcr