On 12 apr 2010, at 22.32, Carson Gaspar wrote: > Carson Gaspar wrote: >> Miles Nordin wrote: >>>>>>>> "re" == Richard Elling <richard.ell...@gmail.com> writes: >>> How do you handle the case when a hotplug SATA drive is powered off >>> unexpectedly with data in its write cache? Do you replay the writes, or do >>> they go down the ZFS hotplug write hole? >> If zfs never got a positive response to a cache flush, that data is still in >> memory and will be re-written. Unless I greatly misunderstand how ZFS >> works... >> If the drive _lies_ about a cache flush, you're screwed (well, you can >> probably roll back a few TXGs...). Don't buy broken drives / bridge chipsets. > > Hrm... thinking about this some more, I'm not sure what happens if the drive > comes _back_ after a power loss, quickly enough that ZFS is never told about > the disappearance (assuming that can happen without a human cfgadm'ing it > back online - I don't know). > > Does anyone who understands the internals better than care to take a stab at > what happens if: > > - ZFS writes data to /dev/foo > - /dev/foo looses power and the data from the above write, not yet flushed to > rust (say a field tech pulls the wrong drive...) > - /dev/foo powers back on (field tech quickly goes whoops and plugs it back > in) > > In the case of a redundant zpool config, when will ZFS notice the uberblocks > are out of sync and repair? If this is a non-redundant zpool, how does the > response differ?
To be safe, the protocol needs to be able to discover that the devices (host or disk) has been disconnected and reconnected or has been reset and that either parts assumptions about the state of the other has to be invalidated. I don't know enough about either SAS or SATA to say if they guarantee that you will be noticed. But if they don't, they aren't safe for cached writes. /ragge _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss