> A good handful of people approached me later, being
> curious and fascinated by the idea to replace the
> backup scheduler with an event-driven creation of the
> versions.

Uwe,

I'm still struggling to decide if ADM is what you're looking for.  When you 
make comments like the one quoted above, I think ADM is a very practical choice 
for you.  Even if it isn't, the issues discussed here are what lead people to 
an ADM-like solution.

Let me attempt to summarize the dilemmas as I see them, and point out the 
practicality of an ADM-like solution...

* Application agnostic CDP cannot know when the file state is sane.  For true 
CDP this essentially requires preserving the entire write stream, which is an 
enormous burden (in both storage capacity and system bandwidth).  Presumably 
this burden is unacceptable except in niche cases.
Basically: it works, but it hurts.

* Application aware/driven CDP solves the file sanity challenge by being 
explicitly told by the app.  But this will have an inherently limited market 
because it relies on application support.  
Basically: it works, but requires coordination rarely found outside monopoly 
owned stacks.

* Traditional backup leaves exposure windows and doesn't address the file 
sanity issue (unless there is a backup window, or specific assumptions)
Basically: its easy because it overlooks so much.

Unless you have a large budget, some compromises need to be made.  IMO, ADM is 
a reasonable compromise for many.

With ADM, backing up files is typically initiated at a specified time after 
file modification.  For this discussion, think of it as: “make a new backup 
anytime file data is stable for X amount of time”.  There can be many policies 
for files with different usage patterns in a file system.  These should be 
tailored to business value, anticipated modification frequency, etc.  

Here's a few examples of policies one might set up:
- Never backup files with /firefox/cache/ in the path.
- Backup (to disk) the CEO's Star-Office docs when they're stable for 1 minute.
- Backup (to disk) other user's Star-Office docs when they're stable for 5 
minutes.
- Backup (to disk) all other files when stable for 5 hours.
- Make a second backup (to tape) of all files when they're stable for 24 hours.

Note how the file data stability time can ignorantly handle the file 
consistency issue.  Pauses in file modification should generally occur when the 
data is consistent.  If not, we'll back it up again anyway after the next round 
of modifications.

The overhead introduced by ADM is less than  you might imagine...  ADM/DMAPI 
can enable specific event types on a per-filesystem-object basis, so the 
versatility of the policies above does not come at the expense of excess 
chatter.  ADM's evaluation of a file is triggered by a change or close event.  
So we look when there is reason to be believe we have work to do.

ADM has several benefits relevant to this discussion:
- Automated management of the thousands/millions of backups.  How many to keep, 
should they be migrated from disk to tape, etc.
- Automated reclaiming & reuse of media used for backups.
- No burden of maintaining entire write stream
- No requirement for application support
- For most file access patterns, we should make good guesses on when the data 
is consistent.

If you're willing to give up the “last mile” requirement of CDP ADM is a fairly 
cheap way to give you a lot of what you want.  Thoughts?

(in ADM we use the term “archive” but here I'm using the term “backup” since 
that's what you're using)

-Joe
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to