>>>>> "enh" == Edward Ned Harvey <solar...@nedharvey.com> writes:
enh> If you have zpool less than version 19 (when ability to remove enh> log device was introduced) and you have a non-mirrored log enh> device that failed, you had better treat the situation as an enh> emergency. Ed the log device removal support is only good for adding a slog to try it out, then changing your mind and removing the slog (which was not possible before). It doesn't change the reliability situation one bit: pools with dead slogs are not importable. There've been threads on this for a while. It's well-discussed because it's an example of IMHO broken process of ``obviously a critical requirement but not technically part of the original RFE which is already late,'' as well as a dangerous pitfall for ZFS admins. I imagine the process works well in other cases to keep stuff granular enough that it can be prioritized effectively, but in this case it's made the slog feature significantly incomplete for a couple years and put many production systems in a precarious spot, and the whole mess was predicted before the slog feature was integrated. >> The on-disk log (slog or otherwise), if I understand right, can >> actually make the filesystem recover to a crash-INconsistent >> state enh> You're speaking the opposite of common sense. Yeah, I'm doing it on purpose to suggest that just guessing how you feel things ought to work based on vague notions of economy isn't a good idea. enh> If disabling the ZIL makes the system faster *and* less prone enh> to data corruption, please explain why we don't all disable enh> the ZIL? I said complying with fsync can make the system recover to a state not equal to one you might have hypothetically snapshotted in a moment leading up to the crash. Elsewhere I might've said disabling the ZIL does not make the system more prone to data corruption, *iff* you are not an NFS server. If you are, disabling the ZIL can lead to lost writes if an NFS server reboots and an NFS client does not, which can definitely cause app-level data corruption. Disabling the ZIL breaks the D requirement of ACID databases which might screw up apps that replicate, or keep databases on several separate servers in sync, and it might lead to lost mail on an MTA, but because unlike non-COW filesystems it costs nothing extra for ZFS to preserve write ordering even without fsync(), AIUI you will not get corrupted application-level data by disabling the ZIL. you just get missing data that the app has a right to expect should be there. The dire warnings written by kernel developers in the wikis of ``don't EVER disable the ZIL'' are totally ridiculous and inappropriate IMO. I think they probably just worked really hard to write the ZIL piece of ZFS, and don't want people telling their brilliant code to fuckoff just because it makes things a little slower. so we get all this ``enterprise'' snobbery and so on. ``crash consistent'' is a technical term not a common-sense term, and I may have used it incorrectly: http://oraclestorageguy.typepad.com/oraclestorageguy/2007/07/why-emc-technol.html With a system that loses power on which fsync() had been in use, the files getting fsync()'ed will probably recover to more recent versions than the rest of the files, which means the recovered state achieved by yanking the cord couldn't have been emulated by cloning a snapshot and not actually having lost power. However, the app calling fsync() will expect this, so it's not supposed to lead to application-level inconsistency. If you test your app's recovery ability in just that way, by cloning snapshots of filesystems on which the app is actively writing and then seeing if the app can recover the clone, then you're unfortunately not testing the app quite hard enough if fsync() is involved, so yeah I guess disabling the ZIL might in theory make incorrectly-written apps less prone to data corruption. Likewise, no testing of the app on a ZFS will be aggressive enough to make the app powerfail-proof on a non-COW POSIX system because ZFS keeps more ordering than the API actually guarantees to the app. I'm repeating myself though. I wish you'll just read my posts with at least paragraph granularity instead of just picking out individual sentences and discarding everything that seems too complicated or too awkwardly stated. I'm basing this all on the ``common sense'' that to do otherwise, fsync() would have to completely ignore its filedescriptor argument. It'd have to copy the entire in-memory ZIL to the slog and behave the same as 'lockfs -fa', which I think would perform too badly compared to non-ZFS filesystems' fsync()s, and would lead to emphatic performance advice like ``segregate files that get lots of fsync()s into separate ZFS datasets from files that get high write bandwidth,'' and we don't have advice like that in the blogs/lists/wikis which makes me think it's not beneficial (the benefit would be dramatic if it were!) and that fsync() works the way I think it does. It's a slightly more convoluted type of ``common sense'' than yours, but mine could still be wrong.
pgp2NZ32siAGy.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss