Re: [zfs-discuss] file not persistent after node bounce when there is a bad disk?

eric kustarz Tue, 23 Jan 2007 15:01:50 -0800

Note that the bad disk on the node caused a normal reboot to hang.I also verified that sync from the command line hung. I don't knowhow ZFS (or Solaris) handles situations involving bad disks...doesa bad disk block proper ZFS/OS handling of all IO, even to theother healthy disks?
Is it reasonable to have assumed that after 60 seconds the datawould have been on persistent disk even without an explicit sync?I confess I don't know how the underlying layers are implemented.Are there mount options or other config parameters we should tweakto get more reliable behavior in this case?


Hey Peter,

The first thing i would do is see if any I/O is happening ('zpooliostat 1'). If there's none, then perhaps the machine is hung (whichyou then would want to grab a couple of '::threadlist -v 10's frommdb to figure out if there are hung threads).

60 seconds should be plenty of time for the async write(s) tocomplete. We try to push out txg (transaction groups) every 5seconds. However, if the system is overloaded, then the txgs couldtake longer.

They 'sync' hanging is intriguing. Perhaps the system is justoverloaded and sync command is making it worse. Seeing what 'fsync'would do would be interesting.

So far as I've seen, this behavior is reproducible, if someone onthe ZFS team wishes to take a closer look at this scenario.


What else is the machine doing?

eric

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file not persistent after node bounce when there is a bad disk?

Reply via email to