>>>>> "re" == Richard Elling <richard.ell...@gmail.com> writes:
re> it seems the hypervisors try to do crazy things like make the re> disks readonly, haha! re> which is perhaps the worst thing you can do to a guest OS re> because now it needs to be rebooted I might've set it up to ``pause'' the VM for most failures, and for punts like this read-only case, maybe leave it paused until someone comes along to turn it off or unpause it. But for loss of connection to an iSCSI-backed disk, I think that's wrong. I guess the truly correct failure handling would be to immediately poweroff the guest VM: pausing it tempts the sysadmin to fix the iscsi connection and unpause it, which in this case is the only real disaster-begging thing to do. One would get a lot of complaints from sysadmins who don't understand the iscsi write hole, but I think it's right. so...in that context, maybe read-only-until-reboot is actually not so dumb! For guests unknowingly getting their disks via NFS, it would make sense to pause the VM to stop (some of) its interval timer(s), (and hope you get the timer running the ATA/SCSI/... driver among the stopped ones) because the guest's disk driver won't understand NFS hard mount timeout rules---won't understand that, for certain errors, you can pass ``stale file handle'' up the stack, but for other errors you must wait forever. Instead they'll enforce a 30-second timeout like for an ATA disk. I think you could probably still avoid losing the 'write B' if the guest fired its ATA timeout with an NFS-backed disk because the writes have already been handed off to the host. It might be weird user experience in the VM manager because whatever process is doing the NFS writes will be unkillable 'D' state even if you poweroff the VM, but this weirdness is an expression of arcane reality, not a bug. It'd be better sysadmin experience to avoid the guest ATA timeout, though: pause the VM and resume so that NFS server reboots would freeze guests for a while, not require rebooting them, just like they do for nonvirtual NFSv3 clients. You would have to figure out the maximum number of seconds the guests can go without disk access, and deviously pause them before their burried / proprietary disk timeouts can fire.
pgpYvvjgSY5Gl.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss