This e-mail is not realy intended to the beginners, but Im a beginner so I post it here.

In theory I would like to make a script, which makes a differential backup from the fileserver through
a network mounted directory. The script may run about 1-3 times a day. The script will create an
actual state directory with date and time in its name, and also creates a directory with only the differences
in it.


In the implementation I made a dirsync at the first time to have a full backup. Otherwise (when
I already have at least one full backup) I have created a hardlink copy of the latest backup and synronized
it with the original using DirSync. Then I hardlinked the differential based on the result which
was given back by DirSync.


I have tested the script on a small amount of data and it have worked fine. We thought, we make a bigger
stress-test on it, so we have used more than 90 000 files and about 9gigabyte data to see whats happening.
The first backup took place in about 53 minutes which is about fine. The second run (without big changes on
the original files) took place about 15 minutes which is also not bad, since the time includes the
hardlinking and comparing with originals as well. Then I have started to run it in a loop, and I have
realized, that the run times getting longer and longer after every run without any good reason.
Here is the log file my script is generated, so you can see how the run time increases:



2004-01-17+18:43 : BACKUPSYNC started Report on [sdi] /mnt/smb 2 updated /cd_new/software/internet/Mozilla/calendar_windows_1.5_20040109.xpi /mp3/Metallica - Black/10 - Track 10.mp3 0 deleted 2004-01-17+19:00 : BACKUPSYNC finished 2004-01-17+19:10 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-17+19:32 : BACKUPSYNC finished 2004-01-17+19:42 : BACKUPSYNC started Report on [sdi] /mnt/smb 1 updated /cd_new/software/ALL_WinFastPVR_40212.exe 0 deleted 2004-01-17+20:09 : BACKUPSYNC finished 2004-01-17+20:19 : BACKUPSYNC started Report on [sdi] /mnt/smb 6 updated /cd_new/Drivers/Video/LeadtekTVExpert /cd_new/Drivers/Video/LeadtekTVExpert/ALL_DRV_88X_31124.zip /cd_new/Drivers/Video/LeadtekTVExpert/ALL_DRV_88X_40212.zip /cd_new/Drivers/Video/LeadtekTVExpert/ALL_WinFastPVR_40212.exe /szemet/ajanlat.sxw /szemet/TL1000.avi 0 deleted 2004-01-17+20:49 : BACKUPSYNC finished 2004-01-17+20:59 : BACKUPSYNC started Report on [sdi] /mnt/smb 2 updated /szemet/ajanlat.sxw /szemet/TL1000.avi 0 deleted 2004-01-17+21:35 : BACKUPSYNC finished 2004-01-17+21:45 : BACKUPSYNC started Report on [sdi] /mnt/smb 2 updated /szemet/árvíztűrőtükörfúrógép.txt /szemet/toni_01.avi 0 deleted 2004-01-17+22:26 : BACKUPSYNC finished 2004-01-17+22:36 : BACKUPSYNC started Report on [sdi] /mnt/smb 2 updated /szemet/árvíztűrőtükörfúrógép.txt /szemet/toni_01.avi 0 deleted 2004-01-17+23:19 : BACKUPSYNC finished 2004-01-17+23:29 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+00:15 : BACKUPSYNC finished 2004-01-18+00:25 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+01:17 : BACKUPSYNC finished 2004-01-18+01:27 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+02:21 : BACKUPSYNC finished 2004-01-18+02:31 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+03:33 : BACKUPSYNC finished 2004-01-18+03:43 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+04:47 : BACKUPSYNC finished 2004-01-18+04:57 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+06:05 : BACKUPSYNC finished 2004-01-18+06:15 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+08:35 : BACKUPSYNC finished 2004-01-18+08:45 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+10:04 : BACKUPSYNC finished 2004-01-18+10:14 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+11:33 : BACKUPSYNC finished 2004-01-18+11:43 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+13:12 : BACKUPSYNC finished 2004-01-18+13:22 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+14:49 : BACKUPSYNC finished 2004-01-18+14:59 : BACKUPSYNC started Report on [sdi] /mnt/smb 1 updated /uzlet/munkalistak/kontakt.sdc 0 deleted 2004-01-18+16:34 : BACKUPSYNC finished

The script is attached. Im not a pro in perl, so all advice appreciated. The second thing which came to my
attention, that even some files were not modified they become syncronized by DirSync (maybe
because of access or open). Please help me figure out whats the problem, which makes to script
to run longer and longer every time.
#!/usr/bin/perl -w
use strict;
use File::DirSync;
use File::Path;
use File::Find ();
use diagnostics;

# so called Security 
$ENV{'PATH'} = '/bin:/usr/bin';
delete @ENV{qw(IFS CDPATH ENV BASH_ENV)};   # Make %ENV safer

# Config
###########

# Sources of Backup
my %set;
$set{"sdi"}{"srcdir"} = "/mnt/smb";
# These directories are ignored by backup
# they are relative to srcdir
# exmaple: if tmp/ is defined here, then it is 
# /mnt/smb/szemet/project/tmp
$set{"sdi"}{"ign"} = [ "noroo", "Zotya", "tamas", "qnc", "tibu" , "mentesek" ];
# an example second backup set
#$set{"etc"}{"srcdir"} = "/etc";
# If we do not have directory to ignore, than we define
# an empty array 
#$set{"etc"}{"ign"} = []; 

# Basedir of destination for backups
my $bdir = "/mnt/backup";

# Maximal backup slots
my $maxslots=100;

my $admin='[EMAIL PROTECTED]';
my $email=0;

my $logfile="/var/log/backupsync.log";

# From here it is not needed to be modified
############################################
# DEBUG 
# general
my $debug = 0;
# ignore debug
my $idebug = 0;
# hardlink mirror debug
my $hmdebug = 0;
# DirSync Internal messages
my $sdebug = 0;

############################################

# Script Return Code
my $status = 0;

# Directory Sync Variables
my $ds;
my @upd;
my @del;
my @ski;
my @err;

# CDATE
# generates a directory nam from the actual date and time
#############################################################

sub cdate() {
my ($second, $minute, $hour, $day, $month, $year, $weekday, $dayofyear, $isdst) = 
localtime(time);
$year=$year+1900;
if ($day < 10) { $day = "0". $day; }
if ($month < 10) { $month = "0" . $month; }
if ($hour < 10) { $hour = "0" . $hour; }
if ($minute < 10) { $minute = "0" . $minute; }
# This is the format we will use for differential backup directory names
# return "$year-$month-$day";
return "$year-$month-$day+$hour:$minute";
}

# HARDLINKMIRROR
# Populates a directory with hardlinks from tha last backup
# to have a virtual copy we can syncronize with
############################################################

sub hardlinkmirror($$) {
  my $srcdir=shift;
  my $dstdir = shift;
  mkpath $dstdir;
  use vars qw/*name *dir *prune/;
  *name   = *File::Find::name;
  *dir    = *File::Find::dir;
  *prune  = *File::Find::prune;
  print "HARDLINK : $srcdir -> $dstdir\n" if $hmdebug == 1;
  my $wanted = sub {
     /^.*\z/s;
     my $src = $File::Find::name;
     my $dst = $File::Find::name;
     $dst =~ s[^\Q$srcdir][$dstdir]g || die "$File::Find::name does not start with 
$srcdir\n";
     if ( -f  $src ) {
       # If it is a file we should link it
       link $src, $dst;
       print "Linked ->  $dst\n" if $hmdebug == 1;
       push @del,$dst if $dstdir =~ /deleted/;
     } else {
       # If it is a directory we should create it
       # unless it is already exist (DIFF may have created it
       # if it was deleted )
       if (!( -e $dst)) {
         mkpath $dst;
         print "Mkdir ->   $dst\n" if $hmdebug == 1;
         push @del,$dst if $dstdir =~ /deleted/;
       }
     }
     print "SRC:       $src\n" if $hmdebug == 1;
   };
   # src directory find
   File::Find::find({wanted => \&$wanted}, $srcdir);
}

# SYNC
# Syncronizes the given set with the destination, using DirSync
#################################################################

sub sync($$$$) {
  my $id=shift;
  my $set=shift;
  my $srcdir=$set{$id}{srcdir};
  my $dstdir=shift;
  my $ds=shift;
  # DirSync settings
  $ds = new File::DirSync {
  # Internal messages
    verbose => $sdebug,
  # If we do not have write access to SRC
  # then we can not build cache
  # so the value should be 1 then, but slows down
  # the syncronization
    nocache => 0,
  # If it is set to 1, it is not recursive
    localmode => 0,
  };
  print "SYNC: $srcdir <-> $dstdir\n" if $debug == 1; 
  $ds->src($srcdir);
  $ds->dst($dstdir);

  # If we do not want to backup a subdirectory of our set
  foreach my $ign ( @{$set{$id}{"ign"}} ) {
    print "IGNORE: $ign\n" if $idebug == 1;
    $ds->ignore($ign);
  }
  
  $ds->rebuild();
  print "REBUILD DONE\n" if $debug == 1;
  $ds->dirsync();
  @upd=$ds->entries_updated();
  @del=$ds->entries_removed();
  @ski=$ds->entries_skipped();
  @err=$ds->entries_failed();
}

# GETMM
# Returns the oldest and the new backup directoris, furthermore the
# number of backups already made
####################################################################

sub getmm ($$) {
  my $dstdir=shift;
  my $list=shift;
  my @var;

  my $wanted = sub {
     -d _;
     my $dir=$File::Find::name;
     $dir =~ s[^\Q$dstdir][]g || die "$File::Find::name does not start with $dstdir\n";
     my $depth = $dir =~ tr[/][];
     if ( $depth == 1 ) {
       $dir =~ s[/][]g;
       if (length($dir) != 0 ) {
         my $value = $dir;
         $value =~ s[-|:|\+][]g;
         push @var,{'dir' => $dir, 'value' => $value };
       }
     }
  };
  
  File::Find::find({ wanted => \&$wanted }, $dstdir  ) if -e $dstdir;
  
  if (@var) {
    my $min=0;
    my $max=0;
    foreach my $i (1..$#var) {
       $max=$i if $var[$i]{"value"} > $var[$max]{"value"};
       $min=$i if $var[$i]{"value"} < $var[$min]{"value"};
    }
    $list->{min}="$dstdir/$var[$min]{dir}";
    $list->{max}="$dstdir/$var[$max]{dir}";
    $list->[EMAIL PROTECTED];
  } else {
    $list->{num}=0;
  }
}

# DIFF
#######

sub diff ($$@) {
  # the dir we have created the sync on
  my $synced=shift;
  # last full backup dir on update (=$synced)
  # last-1 full backup dir on delete
  my $srcbase=shift;
  # differential destination dir
  my $dstbase=shift;
  # updated or deleted files
  my @[EMAIL PROTECTED];
  mkpath $dstbase unless -e $dstbase;
  foreach my $src (@files) {
    my $dst = $src;
    $dst =~ s[^\Q$synced][$dstbase]g;
    # We have to replace the source base to the last-1 full
    # backup if the file was deleted
    $src =~ s[^\Q$synced][$srcbase]g if $synced !~ $srcbase;
    my ($dstdir) = $dst =~ m/(.*\/).*$/;
    mkpath $dstdir unless -e $dstdir;
    if ( -f  $src ) {
      link $src,$dst; 
    } else {
      mkpath ($dst)  unless -e $dst;
      # we only need to hardlink files under directory 
      # if it was deleted, otherwise (on update) the
      # files are returned in the @upd variable
      hardlinkmirror($src,$dst) if $synced !~ $srcbase;
    }
  }
}

# REPORT
###################

sub report ($$) {
  my $id = shift;
  my $set = shift;
  my @log;
  push @log, "Report on [$id] ".$set{$id}{srcdir}."\n";
  push @log, scalar @upd." db fájl frissült\n";
  foreach my $updated (@upd) {
    $updated =~ s[^$bdir/\S+/$id/\d{4}-\d{2}-\d{2}\+\d{2}:\d{2}][]g;
    if ($updated) {   
      push @log, "$updated\n";
    } else { 
      push @log, "/\n";
    }
  }
  push @log, scalar @del." db fájl lett törölve\n";
  foreach my $deleted (@del) {
    $deleted =~ s[^$bdir/\S+/$id/\d{4}-\d{2}-\d{2}\+\d{2}:\d{2}][]g;
    $deleted =~ s[/deleted][]g;
    push @log, "$deleted\n";
  }
  if ( ( @ski > 0 ) && ( $debug == 1 ) ) {
    push @log, scalar @ski." db fájl nem módosult\n";
    foreach my $skipped (@ski) {
      $skipped =~ s[^$bdir/\S+/$id/\d{4}-\d{2}-\d{2}\+\d{2}:\d{2}][]g;
      push @log, "$skipped\n";
    }
  }
  if (@err > 0) {
    push @log,scalar @err." db fájl menentése közben HIBA LÉPETT FEL !\n";
    foreach my $error (@err) {
      $error =~ s[^$bdir/\S+/$id/\d{4}-\d{2}-\d{2}\+\d{2}:\d{2}][]g;
      push @log, "$error\n";
      $status=1;
    }
  }
  if ( $email == 1 ) {
     open(MAILIT,"|mail $admin -s 'BackupSync Report'");
     print MAILIT @log;
     close MAILIT;
  }
  print LOG @log;
}

# MAIN
########
open(LOG,">>$logfile");
my $now = cdate();
print LOG "$now : BACKUPSYNC started\n";
foreach our $id (keys(%set)) {
  if ( -e $set{$id}{srcdir}) {
    mkpath "$bdir/act/$id" unless -e "$bdir/act/$id";
    mkpath "$bdir/diff/$id" unless -e "$bdir/diff/$id";
    getmm("$bdir/act/$id",\my %backupdir);
    my $dstdir = "$bdir/act/$id/$now";
    if ( $backupdir{num} == 0 ) {
      # If we do not have backup at all yet
      sync($id,\%set,$dstdir,\$ds);
      report($id,\%set);
    } else {     
      # If we already have at least one (full) backup
      if ( !( -e $dstdir ) ) {           
        hardlinkmirror($backupdir{max},$dstdir);    
        sync($id,\%set,$dstdir,\$ds);
        my $srcdir=$dstdir;
        if (@del > 0) {
          $dstdir = "$bdir/diff/$id/$now/deleted";
          diff ($srcdir,$backupdir{max},$dstdir,@del);
        }
        if (@upd > 0) {
          $dstdir = "$bdir/diff/$id/$now/updated";
          diff ($srcdir,$srcdir,$dstdir,@upd);
        }
        report($id,\%set);
      } else {
        # A backup has been already made in this minute
        print "We already have a differential backup in this minute: $now\n";
        $status=1;
      }
      rmtree $backupdir{min} if $backupdir{num} > $maxslots;
    } 
      
  } else {
    print "ERROR: the source: $set{$id}{srcdir} is not accessable !\n"; 
    $status=1;
  }
}

$now = cdate();
print LOG "$now : BACKUPSYNC finished\n";
close LOG;

# If an error has occured return a non-zero exit code
exit(1) if ($status == 1);

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Reply via email to