On 07/27/09 01:27 PM, Eric D. Mudama wrote:
Everyone on this list seems to blame lying hardware for ignoring commands, but disks are relatively mature and I can't believe that major OEMs would qualify disks or other hardware that willingly ignore commands.
You are absolutely correct, but if the cache flush command never makes it to the disk, then it won't see it. The contention is that by not relaying the cache flush to the disk, VirtualBox caused the OP to lose his pool. IMO this argument is bogus because AFAIK the OP didn't actually power his system down, so the data would still have been in the cache, and presumably have eventually have been written. The out-of-order writes theory is also somewhat dubious, since he was able to write 10TB without VB relaying the cache flushes. This is all highly hardware dependant, and AFAIK no one ever asked the OP what hardware he had, instead, blasting him for running VB on MSWindows. Since IIRC he was using raw disk access, it is questionable whether or not MS was to blame, but in general it simply shouldn't be possible to lose a pool under any conditions. It does raise the question of what happens in general if a cache flush doesn't happen if, for example, a system crashes in such a way that it requires a power cycle to restart, and the cache never gets flushed. Do disks with volatile caches attempt to flush the cache by themselves if they detect power down? It seems that the ZFS team recognizes this as a problem, hence the CR to address it. It turns out that (at least on this almost 4 year old blog) http://blogs.sun.com/perrin/entry/the_lumberjack that the ZILs /are/ allocated recursively from the main pool. Unless there is a ZIL for the ZILs, ZFS really isn't fully journalled, and this could be the real explanation for all lost pools and/or file systems. It would be great to hear from the ZFS team that writing a ZIL, presumably a transaction in it's own right, is protected somehow (by a ZIL for the ZILs?). Of course the ZIL isn't a journal in the traditional sense, and AFAIK it has no undo capability the way that a DBMS usually has, but it needs to be structured so that bizarre things that happen when something as robust as Solaris crashes don't cause data loss. The nightmare scenario is when one disk of a mirror begins to fail and the system comes to a grinding halt where even stop-a doesn't respond, and a power cycle is the only way out. Who knows what writes may or may not have been issued or what the state of the disk cache might be at such a time. -- Frank _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss