>>>>> "re" == Richard Elling <richard.ell...@gmail.com> writes:

    re> it seems the hypervisors try to do crazy things like make the
    re> disks readonly,

haha!

    re> which is perhaps the worst thing you can do to a guest OS
    re> because now it needs to be rebooted

I might've set it up to ``pause'' the VM for most failures, and for
punts like this read-only case, maybe leave it paused until someone
comes along to turn it off or unpause it.  But for loss of connection
to an iSCSI-backed disk, I think that's wrong.  I guess the truly
correct failure handling would be to immediately poweroff the guest
VM: pausing it tempts the sysadmin to fix the iscsi connection and
unpause it, which in this case is the only real disaster-begging thing
to do.  One would get a lot of complaints from sysadmins who don't
understand the iscsi write hole, but I think it's right.  so...in that
context, maybe read-only-until-reboot is actually not so dumb!

For guests unknowingly getting their disks via NFS, it would make
sense to pause the VM to stop (some of) its interval timer(s), (and
hope you get the timer running the ATA/SCSI/... driver among the
stopped ones) because the guest's disk driver won't understand NFS
hard mount timeout rules---won't understand that, for certain errors,
you can pass ``stale file handle'' up the stack, but for other errors
you must wait forever.  Instead they'll enforce a 30-second timeout
like for an ATA disk.  I think you could probably still avoid losing
the 'write B' if the guest fired its ATA timeout with an NFS-backed
disk because the writes have already been handed off to the host.  It
might be weird user experience in the VM manager because whatever
process is doing the NFS writes will be unkillable 'D' state even if
you poweroff the VM, but this weirdness is an expression of arcane
reality, not a bug.  It'd be better sysadmin experience to avoid the
guest ATA timeout, though: pause the VM and resume so that NFS server
reboots would freeze guests for a while, not require rebooting them,
just like they do for nonvirtual NFSv3 clients.  You would have to
figure out the maximum number of seconds the guests can go without
disk access, and deviously pause them before their burried /
proprietary disk timeouts can fire.

Attachment: pgpYvvjgSY5Gl.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to