On 07/27/09 01:27 PM, Eric D. Mudama wrote:

Everyone on this list seems to blame lying hardware for ignoring
commands, but disks are relatively mature and I can't believe that
major OEMs would qualify disks or other hardware that willingly ignore
commands.
You are absolutely correct, but if the cache flush command never makes
it to the disk, then it won't see it. The contention is that by not
relaying the cache flush to the disk, VirtualBox caused the OP to lose
his pool.

IMO this argument is bogus because AFAIK the OP didn't actually power
his system down, so the data would still have been in the cache, and
presumably have eventually have been written. The out-of-order writes
theory is also somewhat dubious, since he was able to write 10TB without
VB relaying the cache flushes. This is all highly hardware dependant,
and AFAIK no one ever asked the OP what hardware he had, instead,
blasting him for running VB on MSWindows. Since IIRC he was using raw
disk access, it is questionable whether or not MS was to blame, but
in general it simply shouldn't be possible to lose a pool under
any conditions.

It does raise the question of what happens in general if a cache
flush doesn't happen if, for example, a system crashes in such a way
that it requires a power cycle to restart, and the cache never gets
flushed. Do disks with volatile caches attempt to flush the cache
by themselves if they detect power down? It seems that the ZFS team
recognizes this as a problem, hence the CR to address it.

It turns out that (at least on this almost 4 year old blog)
http://blogs.sun.com/perrin/entry/the_lumberjack that the ZILs
/are/ allocated recursively from the main pool.  Unless there is
a ZIL for the ZILs, ZFS really isn't fully journalled, and this
could be the real explanation for all lost pools and/or file
systems. It would be great to hear from the ZFS team that writing
a ZIL, presumably a transaction in it's own right, is protected
somehow (by a ZIL for the ZILs?).

Of course the ZIL isn't a journal in the traditional sense, and
AFAIK it has no undo capability the way that a DBMS usually has,
but it needs to be structured so that bizarre things that happen
when something as robust as Solaris crashes don't cause data loss.
The nightmare scenario is when one disk of a mirror begins to
fail and the system comes to a grinding halt where even stop-a
doesn't respond, and a power cycle is the only way out. Who
knows what writes may or may not have been issued or what the
state of the disk cache might be at such a time.

-- Frank

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to