On Fri, Jul 22, 2005 at 07:53:00PM -0700, Danny Howard wrote: > On Fri, Jul 22, 2005 at 02:53:57PM -0500, Karl Denninger wrote: > [...] > > Note carefully from this that there is NO ERROR INDICATION AS TO WHY THE > > DISK DETACHED! > > > > At least with the 5.x problems you'd SEE an error before it went BOOM. > > > > This time around, nope - just death. > > > > What's worse, the complaints continue even through a shutdown ... > > While I agree with Karl that introducing instability is a very bad > thing, I guess we now have an answer to Karl's vexation yesterday: > [ http://lists.freebsd.org/pipermail/freebsd-stable/2005-July/017210.html ] > > "What I don't understand Robert is why Soren's code is "too > sensitive" to commit, but the explosive reduction in stability > that the changes made between 4.x and 5.3 caused weren't > enough to back THAT out until it could be fixed." > > The answer would seem to be that when someone actually does test the > untested code, it is even worse than the code we are already upset with. > :) > > Love, > -danny
Point taken. Can we get a <COMMITMENT> from the development team that 6.x will <NOT> go out the door until this problem is identified and FIXED (e.g. the PR I submitted against this early in the year is closed)? The problem is trivially easy to reproduce, as I've pointed out. My hardware is hardly anything special - its a Dell Poweredge 400SC, a rather pedestrian 2.4Ghz P4/HT machine with 512MB of RAM and nothing special in terms of boards in it. Indeed, on the sandbox machine the ONLY cards in the machine are the Adaptec SATA card and a video board! The ICH SATA onboard adapter works fine. No problems, even if you beat the snot out of the disks. Ditto for the onboard PATA channels. ANY PCI SII-chipset SATA card (nothing fancy here, no onboard RAID, just a disk adapter) that I've tried thus far - Bustek or Adaptec - causes trouble in an absolutely reproducable fashion when put under heavy load. If both channels are in use the trouble is immediate and dramatic, although you CAN provoke errors even with only one of the two channels in operation if you can get the I/O load up high enough. Gmirror is great for provoking this as it queues traffic to both channels in a nicely balanced and heavily-utilized fashion, although I'm willing to bet that Gmirror itself is not involved as the actual cause of the problem, since I had trouble once DURING install (before I had put a gmirror'ed config on the disks.) Note that a MIX of read and writes appears to be required - a REBUILD of the disks by Gmirror (which is all writes to those two disks) succeeds. As soon as you have all three subdisks in the array, however, a "make buildworld" produces fireworks. If necessary (or useful) I can give one or more developers a way to log into the sandbox machine here via ssh. I do not have a way to get a serial console on the box, however, so if its blown up in an unrecoverable fashion remotely someone would have to call or IM me to push the big red button. If that's NOT necessary (or desired), then I want to move those two disks back to the production machine as they are how my offsite/offline backups are done - I've no problem with leaving them on the sandbox IF the problem is being actively worked though. -- -- Karl Denninger ([EMAIL PROTECTED]) Internet Consultant & Kids Rights Activist http://www.denninger.net My home on the net - links to everything I do! http://scubaforum.org Your UNCENSORED place to talk about DIVING! http://homecuda.com Emerald Coast: Buy / sell homes, cars, boats! http://genesis3.blogspot.com Musings Of A Sentient Mind _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"