On 26-Jul-09, at 11:08 AM, Frank Middleton wrote:
On 07/25/09 04:30 PM, Carson Gaspar wrote:
No. You'll lose unwritten data, but won't corrupt the pool, because
the on-disk state will be sane, as long as your iSCSI stack doesn't
lie about data commits or ignore cache flush commands. Why is this so
difficult for people to understand? Let me create a simple example
for you.
Are you sure about this example? AFAIK metadata refers to things like
the file's name, atime, ACLs, etc., etc. Your example seems to be more
about how a journal works, which has little to do with metatdata other
than to manage it.
Now if you were too lazy to bother to follow the instructions
properly,
we could end up with bizarre things. This is what happens when
storage
lies and re-orders writes across boundaries.
On 07/25/09 07:34 PM, Toby Thain wrote:
The problem is assumed *ordering*. In this respect VB ignoring
flushes
and real hardware are not going to behave the same.
Why? An ignored flush is ignored. It may be more likely in VB, but it
can always happen.
And whenever it does: guess what happens?
It mystifies me that VB would in some way alter
the ordering.
Carson already went through a more detailed explanation. Let me try a
different one:
ZFS issues writes A, B, C, FLUSH, D, E, F.
case 1) the semantics of the flush* allow ZFS to presume that A, B, C
are all 'committed' at the point that D is issued. You can understand
that A, B, C may be done in any order, and D, E, F may be done in any
order, due to the numerous abstraction layers involved - all the way
down to the disk's internal scheduling. ANY of these layers can
affect the ordering of durable, physical writes _in the absence of a
flush/barrier_.
case 2) but if the flush does NOT occur with the necessary semantics,
the ordering of ALL SIX operations is now indeterminate, and by the
time ZFS issues D, any of the first 3 (A, B, C) may well not have
been committed at all. There is a very good chance this will violate
an integrity assumption (I haven't studied the source so I can't
point you to a specific design detail or line; rather I am working
from how I understand transactional/journaled systems to work.
Assuming my argument is valid, I am sure a ZFS engineer can cite a
specific violation).
As has already been mentioned in this context, I think by David
Magda, ordinary hardware will show this problem _if flushes are not
functioning_ (an unusual case on bare metal), while on VirtualBox
this is the default!
...
Doesn't ZIL effectively make ZFS into a journalled file system
Of course ZFS is transactional, as are other filesystems and software
systems, such as RDBMS. But integrity of such systems depends on a
hardware flush primitive that actually works. We are getting hoarse
repeating this.
--Toby
* Essentially 'commit' semantics: Flush synchronously, operation is
complete only when data is durably stored.
...
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss