First of all, let's agree that this discussion of File Versioning makes
no more reference to its usage as Version Control. That is, we aren't
going to talk about it being useful for source code, other than in the
context where a source code file is a document, like any other text
document. File Versioning and Version Control are separate things, with
different purposes and feature sets.
OK. So, now we're on to FV. As Nico pointed out, FV is going to need a
new API. Using the VMS convention of simply creating file names with a
version string afterwards is unacceptible, as it creates enormous
directory pollution, not to mention user confusion. So, FV has to be
invisible to non-aware programs.
Now we have a problem: how do we access FV for non-local (e.g.
SAMBA/NFS) clients? Since the VAST majority of usefulness of FV is in
the network file server arena, unless we can use FV over the network, it
is useless. You can't modify the SMB or NFS protocol (easily or
quickly) to add FV functionality (look how hard it was to add ACLs to
these protocols).
About the only way I can think around this problem is to store versions
in a special subdir of each directory (e.g. .zfs_version), which would
then be browsable over the network, using tools not normally FV-aware.
But this puts us back into the problem of a directory which potentially
has hundreds or thousands of files.
Also, "save-early-save-often" results in a version explosion, as does
auto-save in the app. While this may indeed mean that you have all of
your changes around, figuring out which version has them can be
massively time-consuming. Let's say you have auto-save set for 5
minutes (very common in MS Word). That gives you 12 versions per hour.
If you suddenly decide you want to back up a couple of hours, that
leaves you with looking at a whole bunch of files, trying to figure out
which one you want. E.g. I want a file from about 3 hours ago. Do I
want the one from 2:45, 2:50, 2:55, 3:00, 3:05, 3:10, or 3:15 hours
ago? And, what if I've mis-remembered, and it really was closer to 4
hours ago? Yes, the data is eventually there. However, wouldn't a
1-hour snapshot capability have saved you an enormous amount of time, by
being able to simplify your search (and, yes, you won't have _exactly_
the version you want, but odds are you will have something close, and
you can put all the time you would have spent searching the FV tree into
restarting work from the snapshot-ed version).
Remember, FV's main audience is going to be "naive" users, not us
technical users, who generally have the problem that FV solves under
control (yes, FV would make it easier for us, but we're not the primary
target). Version explosion (and the consequential problem of picking
the right version to edit) is a huge problem for the naive audience.
Also, a big difference between Snapshots and FV tends to be who controls
EOL-ing a version/Snapshot. Snapshots tend to be done by the Admin, and
their aging strictly controlled and defines (e.g. "we keep hourly
snapshots for 1 week"). File versioning is typically under the control
of the End-User, as their utility is much more nebulously defined.
Certainly, there is no ability to truncate based on number of versions
(e.g. "we only allow 100 versions to be kept"), since the frequency of
versioning a file varies widely. Aging on a version is possibly a
better answer, but this runs into a problem of user education, where we
have to retrain our users to stop making frequent copies of important
documents (like they do now, in absence of FV), but _do_ remember to dig
through the FV archive periodically to save a desirable old copy.
Also, if managing FV is to be a User task, how are they to do it over
NFS/SAMBA? And, "log into the NFS server to do a cleanup" isn't an
acceptable answer.
Also, FV is only useful for apps which do a "close()" on a file (or at
least, I'm assuming we wait for a file to signal that it is closed
before taking a version - otherwise, we do what? take a version every X
minutes while the file still open? I shudder to think about the
implementation of this, and its implications...). How many apps keep a
file open for a long period of time? FV isn't useful to them, only an
"unlimited undo" functionality INSIDE the app.
Lastly, consider the additional storage requirement of FV, and exactly
how much utility you gain for sacrificing disk space.
Look at this scenario: I'm editing a file, making 1MB of change per 5
minutes (a likely scenario when actively editing any Office-style
document), of which only 50% to I actually make permanent (the rest
being temp edits for ideas I decide to change or throw out). If I'm
auto-saving every 5 minutes, that means I use 12MB of version space per
hour. If I took a hourly snapshot, then I need only 6MB of storage. The
situation gets worse, for the primary usefulness of FV is for files
which are frequently edited - mean that they have rapid content change,
and not in append-mode. Such a usage pattern means that FV will take up
a much greater amount of space than periodic snapshots, as the longer
interval in snapshots will allow the changes to "settle".
To me, FV is/was very useful in TOPS-20 and VMS, where you were looking
at a system DESIGNED with the idea in mind, already have a user base
trained to use and expect it, and virtually all usage was local (i.e. no
network filesharing). None of this is true in the UNIX/POSIX world.
-Erik
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss