Re: [zfs-discuss] A versioning FS

Erik Trimble Fri, 06 Oct 2006 14:08:45 -0700

First of all, let's agree that this discussion of File Versioning makesno more reference to its usage as Version Control. That is, we aren'tgoing to talk about it being useful for source code, other than in thecontext where a source code file is a document, like any other textdocument. File Versioning and Version Control are separate things, withdifferent purposes and feature sets.

OK. So, now we're on to FV. As Nico pointed out, FV is going to need anew API. Using the VMS convention of simply creating file names with aversion string afterwards is unacceptible, as it creates enormousdirectory pollution, not to mention user confusion. So, FV has to beinvisible to non-aware programs.

Now we have a problem: how do we access FV for non-local (e.g.SAMBA/NFS) clients? Since the VAST majority of usefulness of FV is inthe network file server arena, unless we can use FV over the network, itis useless. You can't modify the SMB or NFS protocol (easily orquickly) to add FV functionality (look how hard it was to add ACLs tothese protocols).

About the only way I can think around this problem is to store versionsin a special subdir of each directory (e.g. .zfs_version), which wouldthen be browsable over the network, using tools not normally FV-aware.But this puts us back into the problem of a directory which potentiallyhas hundreds or thousands of files.

Also, "save-early-save-often" results in a version explosion, as doesauto-save in the app. While this may indeed mean that you have all ofyour changes around, figuring out which version has them can bemassively time-consuming. Let's say you have auto-save set for 5minutes (very common in MS Word). That gives you 12 versions per hour.If you suddenly decide you want to back up a couple of hours, thatleaves you with looking at a whole bunch of files, trying to figure outwhich one you want. E.g. I want a file from about 3 hours ago. Do Iwant the one from 2:45, 2:50, 2:55, 3:00, 3:05, 3:10, or 3:15 hoursago? And, what if I've mis-remembered, and it really was closer to 4hours ago? Yes, the data is eventually there. However, wouldn't a1-hour snapshot capability have saved you an enormous amount of time, bybeing able to simplify your search (and, yes, you won't have _exactly_the version you want, but odds are you will have something close, andyou can put all the time you would have spent searching the FV tree intorestarting work from the snapshot-ed version).

Remember, FV's main audience is going to be "naive" users, not ustechnical users, who generally have the problem that FV solves undercontrol (yes, FV would make it easier for us, but we're not the primarytarget). Version explosion (and the consequential problem of pickingthe right version to edit) is a huge problem for the naive audience.

Also, a big difference between Snapshots and FV tends to be who controlsEOL-ing a version/Snapshot. Snapshots tend to be done by the Admin, andtheir aging strictly controlled and defines (e.g. "we keep hourlysnapshots for 1 week"). File versioning is typically under the controlof the End-User, as their utility is much more nebulously defined.Certainly, there is no ability to truncate based on number of versions(e.g. "we only allow 100 versions to be kept"), since the frequency ofversioning a file varies widely. Aging on a version is possibly abetter answer, but this runs into a problem of user education, where wehave to retrain our users to stop making frequent copies of importantdocuments (like they do now, in absence of FV), but _do_ remember to digthrough the FV archive periodically to save a desirable old copy.Also, if managing FV is to be a User task, how are they to do it overNFS/SAMBA? And, "log into the NFS server to do a cleanup" isn't anacceptable answer.

Also, FV is only useful for apps which do a "close()" on a file (or atleast, I'm assuming we wait for a file to signal that it is closedbefore taking a version - otherwise, we do what? take a version every Xminutes while the file still open? I shudder to think about theimplementation of this, and its implications...). How many apps keep afile open for a long period of time? FV isn't useful to them, only an"unlimited undo" functionality INSIDE the app.

Lastly, consider the additional storage requirement of FV, and exactlyhow much utility you gain for sacrificing disk space.Look at this scenario: I'm editing a file, making 1MB of change per 5minutes (a likely scenario when actively editing any Office-styledocument), of which only 50% to I actually make permanent (the restbeing temp edits for ideas I decide to change or throw out). If I'mauto-saving every 5 minutes, that means I use 12MB of version space perhour. If I took a hourly snapshot, then I need only 6MB of storage. Thesituation gets worse, for the primary usefulness of FV is for fileswhich are frequently edited - mean that they have rapid content change,and not in append-mode. Such a usage pattern means that FV will take upa much greater amount of space than periodic snapshots, as the longerinterval in snapshots will allow the changes to "settle".

To me, FV is/was very useful in TOPS-20 and VMS, where you were lookingat a system DESIGNED with the idea in mind, already have a user basetrained to use and expect it, and virtually all usage was local (i.e. nonetwork filesharing). None of this is true in the UNIX/POSIX world.



-Erik
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A versioning FS

Reply via email to