First of all, let's agree that this discussion of File Versioning makes no more reference to its usage as Version Control. That is, we aren't going to talk about it being useful for source code, other than in the context where a source code file is a document, like any other text document. File Versioning and Version Control are separate things, with different purposes and feature sets.

OK. So, now we're on to FV. As Nico pointed out, FV is going to need a new API. Using the VMS convention of simply creating file names with a version string afterwards is unacceptible, as it creates enormous directory pollution, not to mention user confusion. So, FV has to be invisible to non-aware programs.

Now we have a problem: how do we access FV for non-local (e.g. SAMBA/NFS) clients? Since the VAST majority of usefulness of FV is in the network file server arena, unless we can use FV over the network, it is useless. You can't modify the SMB or NFS protocol (easily or quickly) to add FV functionality (look how hard it was to add ACLs to these protocols).

About the only way I can think around this problem is to store versions in a special subdir of each directory (e.g. .zfs_version), which would then be browsable over the network, using tools not normally FV-aware. But this puts us back into the problem of a directory which potentially has hundreds or thousands of files.

Also, "save-early-save-often" results in a version explosion, as does auto-save in the app. While this may indeed mean that you have all of your changes around, figuring out which version has them can be massively time-consuming. Let's say you have auto-save set for 5 minutes (very common in MS Word). That gives you 12 versions per hour. If you suddenly decide you want to back up a couple of hours, that leaves you with looking at a whole bunch of files, trying to figure out which one you want. E.g. I want a file from about 3 hours ago. Do I want the one from 2:45, 2:50, 2:55, 3:00, 3:05, 3:10, or 3:15 hours ago? And, what if I've mis-remembered, and it really was closer to 4 hours ago? Yes, the data is eventually there. However, wouldn't a 1-hour snapshot capability have saved you an enormous amount of time, by being able to simplify your search (and, yes, you won't have _exactly_ the version you want, but odds are you will have something close, and you can put all the time you would have spent searching the FV tree into restarting work from the snapshot-ed version).

Remember, FV's main audience is going to be "naive" users, not us technical users, who generally have the problem that FV solves under control (yes, FV would make it easier for us, but we're not the primary target). Version explosion (and the consequential problem of picking the right version to edit) is a huge problem for the naive audience.

Also, a big difference between Snapshots and FV tends to be who controls EOL-ing a version/Snapshot. Snapshots tend to be done by the Admin, and their aging strictly controlled and defines (e.g. "we keep hourly snapshots for 1 week"). File versioning is typically under the control of the End-User, as their utility is much more nebulously defined. Certainly, there is no ability to truncate based on number of versions (e.g. "we only allow 100 versions to be kept"), since the frequency of versioning a file varies widely. Aging on a version is possibly a better answer, but this runs into a problem of user education, where we have to retrain our users to stop making frequent copies of important documents (like they do now, in absence of FV), but _do_ remember to dig through the FV archive periodically to save a desirable old copy. Also, if managing FV is to be a User task, how are they to do it over NFS/SAMBA? And, "log into the NFS server to do a cleanup" isn't an acceptable answer.

Also, FV is only useful for apps which do a "close()" on a file (or at least, I'm assuming we wait for a file to signal that it is closed before taking a version - otherwise, we do what? take a version every X minutes while the file still open? I shudder to think about the implementation of this, and its implications...). How many apps keep a file open for a long period of time? FV isn't useful to them, only an "unlimited undo" functionality INSIDE the app.

Lastly, consider the additional storage requirement of FV, and exactly how much utility you gain for sacrificing disk space. Look at this scenario: I'm editing a file, making 1MB of change per 5 minutes (a likely scenario when actively editing any Office-style document), of which only 50% to I actually make permanent (the rest being temp edits for ideas I decide to change or throw out). If I'm auto-saving every 5 minutes, that means I use 12MB of version space per hour. If I took a hourly snapshot, then I need only 6MB of storage. The situation gets worse, for the primary usefulness of FV is for files which are frequently edited - mean that they have rapid content change, and not in append-mode. Such a usage pattern means that FV will take up a much greater amount of space than periodic snapshots, as the longer interval in snapshots will allow the changes to "settle".


To me, FV is/was very useful in TOPS-20 and VMS, where you were looking at a system DESIGNED with the idea in mind, already have a user base trained to use and expect it, and virtually all usage was local (i.e. no network filesharing). None of this is true in the UNIX/POSIX world.


-Erik
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to