> > I've been observing two threads on zfs-discuss with > the following > Subject lines: > > Yager on ZFS > ZFS + DB + "fragments" > > and have reached the rather obvious conclusion that > the author "can > you guess?" is a professional spinmeister,
Ah - I see we have another incompetent psychic chiming in - and judging by his drivel below a technical incompetent as well. While I really can't help him with the former area, I can at least try to educate him in the latter. ... > Excerpt 1: Is this premium technical BullShit (BS) > or what? Since you asked: no, it's just clearly beyond your grade level, so I'll try to dumb it down enough for you to follow. > > ------------- BS 301 'grad level technical BS' > ----------- > > Still, it does drive up snapshot overhead, and if you > start trying to > use snapshots to simulate 'continuous data > protection' rather than > more sparingly the problem becomes more significant > (because each > snapshot will catch any background defragmentation > activity at a > different point, such that common parent blocks may > appear in more > than one snapshot even if no child data has actually > been updated). > Once you introduce CDP into the process (and it's > tempting to, since > the file system is in a better position to handle it > efficiently than > some add-on product), rethinking how one approaches > snapshots (and COW > in general) starts to make more sense. Do you by any chance not even know what 'continuous data protection' is? It's considered a fairly desirable item these days and was the basis for several hot start-ups (some since gobbled up by bigger fish that apparently agreed that they were onto something significant), since it allows you to roll back the state of individual files or the system as a whole to *any* historical point you might want to (unlike snapshots, which require that you anticipate points you might want to roll back to and capture them explicitly - or take such frequent snapshots that you'll probably be able to get at least somewhere near any point you might want to, a second-class simulation of CDP which some vendors offer because it's the best they can do and is precisely the activity which I outlined above, expecting that anyone sufficiently familiar with file systems to be able to follow the discussion would be familiar with it). But given your obvious limitations I guess I should spell it out in words of even fewer syllables: 1. Simulating CDP without actually implementing it means taking very frequent snapshots. 2. Taking very frequent snapshots means that you're likely to interrupt background defragmentation activity such that one child of a parent is moved *before* the snapshot is taken while another is moved *after* the snapshot is taken, resulting in the need to capture a before-image of the parent (because at least one of its pointers is about to change) *and all ancestors of the parent* (because the pointer change will propagate through all the ancestral checksums - and pointers, with COW) in every snapshot that occurs immediately prior to moving *any* of its children rather than just having to capture a single before-image of the parent and all its ancestors after which all its child pointers will likely get changed before the next snapshot is taken. So that's what any competent reader should have been able to glean from the comments that stymied you. The paragraph's concluding comments were considerably more general in nature and thus legitimately harder to follow: had you asked for clarification rather than just assumed that they were BS simply because you couldn't understand them you would not have looked like such an idiot, but since you did call them into question I'll now put a bit more flesh on them for those who may be able to follow a discussion at that level of detail: 3. The file system is in a better position to handle CDP than some external mechanism because a) the file system knows (right down to the byte level if it wants to) exactly what any individual update is changing, b) the file system knows which updates are significant (e.g., there's probably no intrinsic need to capture rollback information for lazy writes because the application didn't care whether they were made persistent at that time, but for any explicitly-forced writes or syncs a rollback point should be established), and c) the file system is already performing log forces (where a log is involved) or batch disk updates (a la ZFS) to honor such application-requested persistence, and can piggyback the required CDP before-image persistence on them rather than requiring separate synchronous log or disk accesses to do so. 4. If you've got full-fledged CDP, it's questionable whether you need snapshots as well (unless you have really, really inflexible requirements for virtually instantaneous rollback and/or for high-performance writable-clone access) - and if CDP turns out to be this decade's important new file system feature just as snapshots were last decade's it will be well worth having optimized for. 5. Block-level COW technology just doesn't cut it for full-fledged CDP unless you can assume truly unlimited storage space: not only does it encounter even worse instances of the defrag-related parent-block issues described above (which I brought up in a different context) but, far worse, it requires that every generation of every block in the system live forever (or at least for the entire time-span within which rollback is contemplated). Hence the rethinking that I mentioned. COW techniques are attractive from an ease-of-implementation standpoint for moderately infrequent snapshots, but as one approaches CDP-like support they become increasingly infeasible. Whereas transaction-log-protected approaches can handle CDP very efficiently (in a manner analogous to a database before-image rollback log), as well as being able to offer everything else that ZFS does with better run-time efficiency (e.g., no longer must the entire ancestral path be updated on disk whenever a leaf node is), plus update-in-place facilities to support good sequential-streaming performance where that makes sense - but at the cost not only of increased implementation complexity but of the need for something resembling innovation (at least in the file system context). And for the occasional installation that really requires high-performance snapshot rollback/writable clone facilities, you can still implement them effectively at the block l evel *underneath* all this file-level stuff and then get rid of their overhead when the requirement has expired. That's as dumbed-down as I'm going to get: if you still can't understand it, please seek help from a colleague. > > ------------- end of BS 301 'grad level technical BS' > ----------- > > Comment: Amazing: so many words, so little meaningful > technical > content! Oh, dear: you seem to have answered the question that you posed above with the same abject cluelessness which you're bringing to the rest of your post. Oh, well: there's something to be said for consistency, I guess. > > Excerpt 2: Even better than Excerpt 1 - truely > exceptional BullShit: No - just truly exceptional arrogance on your part: you really ought to develop at least minimal understanding of a subject before deciding to tackle someone who already has a great deal more than that. > > ------------- BS 401 'PhD level technical BS' > ------------------ > > No, but I described how to use a transaction log to > do so and later on > in the post how ZFS could implement a different > solution more > consistent with its current behavior. In the case of > the transaction > log, the key is to use the log not only to protect > the RAID update but > to protect the associated higher-level file operation > as well, such > that a single log force satisfies both (otherwise, > logging the RAID > update separately would indeed slow things down - > unless you had NVRAM > to use for it, in which case you've effectively just > reimplemented a > low-end RAID controller - which is probably why no > one has implemented > that kind of solution in a stand-alone software RAID > product). That one was already clear enough that I'm just going to let you find a helpful colleague to explain it to you, as suggested above. Someone who knows something about write-ahead logging and how it's usable not only to protect operations but to enhance their performance by capturing the amount of information required to replay one or more serial updates to the same or associated data in the log or in supplements to it while deferring final batch-propagation of those updates back to the main database might be a good bet. Making due allowances for your being in Texas and thus being intimately acquainted with BS on a very personal level, I'll suggest that you refrain from further solidifying that state's stereotypes in the Internet group-mind until you've cleared your insights with someone who actually has some acquaintance with technologies such as file systems, and transaction managers, and log implementations - all of which I have both studied in depth and been well-paid to write from scratch. You might also consider posting your babble from a personal rather than a professional location, unless your profession is completely unrelated to technology in particular and to competence in general. - bill This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss