On Mon, Aug 25, 2008 at 08:17:55PM +1200, Ian Collins wrote: > John Sonnenschein wrote: > > > > Look, yanking the drives like that can seriously damage the drives > > or your motherboard. Solaris doesn't let you do it ...
Haven't seen an andruid/"universal soldier" shipping with Solaris ... ;-) > > and assumes that something's gone seriously wrong if you try it. That Linux > > ignores the behavior and lets you do it sounds more like a bug in linux > > than anything else. Not sure, whether everything, what can't be understood, is "likely a bug" - maybe it is "more forgiving" and tries its best to solve the problem without taking you out of business (see below), even if it requires some hacks not in line with specifications ... > One point that's been overlooked in all the chest thumping - PCs vibrate > and cables fall out. I had this happen with an SCSI connector. Luckily Yes - and a colleague told me, that he've had the same problem once. Also he managed a SiemensFujitsu server, where the SCSI-controller card had a tiny hairline crack: very odd behavior, usually not reproducible, IIRC, the 4th ServiceEngineer finally replaced the card ... > So pulling a drive is a possible, if rare, failure mode. Definitely! And expecting strange controller (or in general hardware) behavior is possibly a big + for an OS, which targets SMEs and "home users" as well (everybody knows about far east and other cheap HW producers, which sometimes seem to say, lets ship it, later we build a special driver for MS Windows, which workarounds the bug/problem ...). "Similar" story: ~ 2000+ we had a WG server with 4 IDE channels PATA, one HDD on each. HDD0 on CH0 mirrored to HDD2 on CH2, HDD1 on CH1 mirrored to HDD3 on CH3, using Linux Softraid driver. We found out, that when HDD1 on CH1 got on the blink, for some reason the controller got on the blink as well, i.e. took CH0 and vice versa down too. After reboot, we were able to force the md raid to re-take the bad marked drives and even found out, that the problem starts, when a certain part of a partition was accessed (which made the ops on that raid really slow for some minutes - but after the driver marked the drive(s) as bad, performance was back). Thus disabling the partition gave us the time to get a new drive... During all these ops nobody (except sysadmins) realized, that we had a problem - thanx to the md raid1 (with xfs btw.). And also we did not have any data corruption (at least, nobody has complained about it ;-)). Wrt. what I've experienced and read in ZFS-discussion etc. list I've the __feeling__, that we would have got really into trouble, using Solaris (even the most recent one) on that system ... So if one asks me, whether to run Solaris+ZFS on a production system, I usually say: definitely, but only, if it is a Sun server ... My 2ยข ;-) Regards, jel. PS: And yes, all the vendor specific workarounds/hacks are for Linux kernel folks a problem as well - at least on Torvalds side discouraged IIRC ... -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss