Vincent Fox wrote: > Ummm, could you back up a bit there? > > What do you mean "disk isn't sync'd so boot should fail"? I'm coming from > UFS of course where I'd expect to be able to fix a damaged boot drive as it > drops into a single-user root prompt. > > I believe I did try boot disk1 but that failed I think due to prior trial > with it, where I scrambled it with dd, then resilvered. Then removed it, > replaced, resilvered it. Think I ended up with unusable boot sector on disk1 > that didn't work but I didn't copy the message down sorry. > > I suppose all that would have been left is boot from media or jumpstart > server in single-user and attempt repairs. Unfortunately I have since > re-jumpstarted the system clean. This was plain nv90 both times by the way > no /etc/system tweaks. > > I have to pull the motherboard on the V240 and replace it tomorrow, maybe on > Friday I will be able to repeat my experiment. Just wanted to run through > some failure-modes so I know what to expect when boot drives die on me. >
Sequence of events failures are one of the most common fatal errors in complex systems. In this case, you induced a failure mode we call amnesia. It works like this: Consider a system with two (!) mirrored disks (A&B) working normally and in sync. At time0, disconnect disk A. It will still contain a view of the system state, but is not accessible by the system. At time1, the system gives up on disk A and proceeds using disk B. Now the two disks are no longer in sync and the data on disk B is newer than the data on disk A. At time2, shutdown the system. Re-attach disk A. The correct behaviour is that disk A is old and its data should be ignored until repaired. Disk B should be the primary, authoritative view of the system state. This failure mode is called amnesia because disk A doesn't remember the changes that should have occurred if it had been an active, functional member of the system. AFAIK, SVM will not handle this problem well. ZFS and Solaris Cluster can detect this because the configuration metadata knows the time difference (ZFS can detect this by the latest txg). I predict that if you had booted from disk B, then it would have worked (but I don't have the hardware setup to test this tonight) NB, for those who don't know about SPARC boot sequences, the OpenBoot program has a default boot device list and will try the first device, then the second, and so on. This is similar to how most BIOSes work. While you wouldn't normally expect to need to worry about this, it makes a difference in the case of amnesia. -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss