I'm looking for feedback on a scheme dynamic filesets. We backup to LTO4, with a schedule where each fileset gets a full every 2 months, a differential weekly, and incrementals nightly. I'm willing to change the differential backup to monthly (alternating with Fulls).
-The Problem- As our data volume has grown--now about 45TB being backed up--managing filesets has become more complex. We've gone through a typical progression, of backing up "everything", to backing up high-level logical groupings (ie., all "home dirs" and all "projects") as separate filesets, then to smaller sets (ie., "home dirs beginning A-F", "projects beginning [G-Kg-i0-4]", etc.). The current filesets are very unbalanced (over 12TB in some). I am reluctant to manually create individual filesets for each group of directories alphabetically (ie.,"project dirs starting with A", "home with B", "source code dirs with C", "collaborator dirs with D" etc.) because this will result in ~100 filesets, and they'd still be very unbalanced in terms of backup volume, due to uneven distribution in project and user names. -Proposed Solution- The new scheme I'm considering would use a dynamic fileset, generated each night, to define 3 backup jobs: Full, Differential, and Incremental. A program would select all directories to be backed up (/home/*, /projects/*, /src/*, /collab/*) and determine the backup level. For each directory to be backed up, the path to the directory is hashed to a number within the range 1-56. The choice of 56 corresponds to double the number of days in February, and allows us to alternate incrementals & differentials each month for a given fileset. For each directory, the backup level is determined by: Full backup if: (Current month is Odd) and (Directory hash <= 28) AND Day of the month == Directory hash OR (Current month is Even) and (Directory hash > 28) AND Day of the month == (Directory hash - 28) Differential backup if: (Current month is Odd) and (Directory hash > 28) AND Day of the month == (Directory hash - 28) OR (Current month is Even) and (Directory hash < 28) AND Day of the month == Directory hash Incremental backup if: Day of the month != Directory hash AND Day of the month != (Directory hash - 28) For example, if the directory "/projects/Bird" had a hash value of 7, it would get: Full backups: Jan 7, Mar 7, May 7, Jul 7, Sep 7, Nov 7 Differential backups: Feb 7, Apr 7, Jun 7, Aug 7, Oct 7, Dec 7 Incremental backups every other night For example, if the directory "/home/Byrd" had a hash value of 44, it would get: Full backups: Feb 16, Apr 16, Jun 16, Aug 16, Oct 16, Dec 16 Differential backups: Jan 16, Mar 16, May 16, Jul 16, Sep 16, Nov 16 Incremental backups every other night Full & Differential backups would not be started on days 29-31 of any month. This should make each nightly backup volume smaller, by running more Full jobs per month, each of a smaller size. The downsides to this scheme that I see are increased complexity and the greater uncertainty about when a particular directory got a Full or Differential backup. What do you think of this scheme? Thanks, Mark -- Mark Bergman voice: 215-662-7310 mark.berg...@uphs.upenn.edu fax: 215-614-0266 http://www.cbica.upenn.edu/ IT Technical Director, Center for Biomedical Image Computing and Analytics Department of Radiology University of Pennsylvania PGP Key: http://www.cbica.upenn.edu/sbia/bergman ------------------------------------------------------------------------------ Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users