On Tue, Feb 26, 2008 at 2:07 PM, Nicolas Williams
<[EMAIL PROTECTED]> wrote:
> How do you use CDP "backups"?  How do you decide at which write(2) (or
>  dirty page write, or fsync(2), ...) to restore some file?  What if the
>  app has many files?  Point-in-time?  Sure, but since you can't restore
>  all application state (unless you're checkpointing processes too) then
>  how can you be sure that the data to be restored is internally
>  consistent?  And if you'll checkpoint processes, then why not just use
>  VMs and checkpoint those and their filesystems instead?  The last option
>  sounds much, much simpler to manage: there's only VM name and timestamp
>  to think about when restoring.  A continuous VM checkpoint facility
>  sounds... unlikely/expensive though.
Sorry, I don't understand any of this. But I never pretended I did.
My post was on something else:
In principle we have three types of write; atomic view, please:
1. Create. The new file needs to be written only, no backup/CDP
needed; identical to any conventional system.
2. Edit/Modify. Here we need to store some incremental/differential
file content. rsync-like, that is.
3. Remove. Also this is similar to the conventional system, except
that the files need to be retired and the blocks *not* be marked as
'available'.

Changes combined with a 'write'/'Save' instruction are not very
frequently seen on personal/home machines. (Let's leave out web cache
and /temp.) But even on the servers that I am running, the gigabytes
of user data do not change very much; seen as percentage of overall
data. Most of the 200.000 files that the users have remain unmodified
for ages. Office files do change, but also not much faster than the
users can type ;) . Web content changes rarely, style sheets and icons
remain unmodified close to forever. The largest changes come with
system/software upgrades. (One might even discuss to exclude these
from CDP, and rather automate a snapshot before; in case of a problem
thereafter. But that is not my topic here and now.)

Also, the granularity of the 'backups' does not really have to be
100%. If - for reasons I can not imagine - a certain file would be
marked for 'save' thrice in a single second, of course you don't need
all the states. You do have the state at the start of that one second
(to which you can roll), as well as the state at the end of that
second (to which you can roll just as well; and you can even roll back
and forward). I can hardly imagine a datafile to which one would want
to roll, which was invalid at the start of that second, is invalid in
the end, but was valid for some milliseconds in between. (How could
one know about this intermediate correctness, would have to be asked.)
Outside of databases, a valid state once per 10 seconds is probably
even overdone. Don't forget: even if you deleted the file, it will
still be there. If you 'save' a file, make a change, 'save' again,
make a mistake and 'save' again, notice you made a mistake ... and all
this within 10 seconds! ... you will still have the state at the begin
of the 10 seconds, as well as the state at the end of those 10
seconds. 10 seconds are a hell of a lot of time to calculate and store
an incremental difference. Of a single file. Whereas in a TimeMachine,
10 seconds can be a hell of a short time. Plus the huge overhead
there, because you need to poll regularly, eventually on a much too
high level, which files have been changed. Actually, chances are none
at all has changed (at least in the /home/ of the user, even in the
/home of the user*s*). Once it is event driven, 'no change' means no
activity at all. Once it is event-driven, and you have 3 changes in 10
seconds, I am pretty sure that all states can be handled without much
trouble.

Uwe
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to