Uh, I should probably clarify some things (I was too quick to hit
send):

> IMO the fundamental problem is that the only way to achieve a write
> barrier is fsync() (disregarding direct I/O etc). Again I would just
> like an fbarrier() as I've mentioned on the list previously. It seems

Of course if fbarrier() is analogous to fsync() this does not actually
address the particular problem which is the main topic of this thread,
since there the fbarrier() would presumably apply only to I/O within
that file.

This particular case would only be helped if the fbarrier() were
global, or at least extending further than the particular file.

Fundamentally, I think a userful observation is that the only time you
ever care about persistence is when you make a contract with an
external party outside of your blackbox of I/O. Typical examples are
database commits and mail server queues. Anything within the blackbox
is only concerned with consistency.

In this particular case, the fsync()/fbarrier() operateon the black
box of the file, with the directory being an external party. The
rename() operation on the directory entry constitutes an operation
which depends on the state of the individual file blackbox, thus
constituting an external dependency and thus requireing persistence.

The question is whether it is necessarily a good idea to make the
blackbox be the entire file system. If it is, a lot of things would be
much much easier. On the other hand, it also makes optimization more
difficult in many cases. For example the latency of persisting 8kb of
data could be very very significant if there is large amounts of bulk
I/O happening in the same file system. So I definitely see the
motivation behind having persistence guarantees be non-global.

Perhaps it boils down to the files+directory model not necessarily
being the best one in all cases. Perhaps one would like to define
subtrees which have global fsync()/fbarrier() type semantics within
each respective subtree.

On the other hand, that sounds a lot like a ZFS file system, other
than the fact that ZFS file system creation is not something which is
exposed to the application programmer.

How about having file-system global barrier/persistence semantics, but
having a well-defined API for creating child file systems rooted at
any point in a hierarchy? It would allow "global" semantics and what
that entails, while allowing that bulk I/O happening in your 1 TB
PostgreSQL database to be segregated, in terms of performance impact,
from your "kde settings" file system.

> What does one need to do to get something happening here? Other than
> whine on mailing lists...

And that came off much more rude than intended. Clearly it's not an
implementation effort issues ince the naive fbarrioer() is basically
calling fsync(). However I get the feeling there is little motivation
in the operating system community for addressing these concerns, for
whatever reason (IIRC it was only recently that some write
barrier/write caching issues started being seriously discussed in the
Linux kernel community for example).

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schul...@infidyne.com>'
Key retrieval: Send an E-Mail to getpgp...@scode.org
E-Mail: peter.schul...@infidyne.com Web: http://www.scode.org

Attachment: pgpMWdfQtjIuW.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to