>>>>> "c" == Miles Nordin <car...@ivy.net> writes:

     c> fbarrier()

on second thought that couldn't help this problem.  The goal is to
associate writing to the directory (rename) with writing to the file
referenced by that inode/handle (write/fsync/``fbarrier''), and in
POSIX these two things are pretty distant and unrelated to each other.
The posix way to associate these two things is to wait for fsync() to
return before asking for the rename.  The waiting is expressive---it's
an extremely simple, easy-to-understand API for associating one thing
with another.  I thought maybe this was so simple there was only one
thing not two, so the wait coudl be skipped, but I am wrong.

It is too bad because as others have said it means these fsync()'s
will have to go in to make the app correct/portable with the API we
have to work under, even though ZFS has certain convenient quirks and
probably doesn't need them.

IMHO the best reaction to the KDE hysteria would be to make sure
SQLite and BerkeleyDB are fast as possible and effortlessly correct on
ZFS, and anything that's slow because of too much synchronous writing
to tiny files should use a library instead.  This is not currently the
case because for high performance one has to manually match DB and ZFS
record sizes which isn't practical for these tiny throwaway databases
that must share a filesystem with nonDB stuff, and there might be room
for improvement in terms of online defragmentation too.

Attachment: pgpIEWQ58qaLi.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to