In theory I would like to make a script, which makes a differential backup from the fileserver through
a network mounted directory. The script may run about 1-3 times a day. The script will create an
actual state directory with date and time in its name, and also creates a directory with only the differences
in it.
In the implementation I made a dirsync at the first time to have a full backup. Otherwise (when
I already have at least one full backup) I have created a hardlink copy of the latest backup and synronized
it with the original using DirSync. Then I hardlinked the differential based on the result which
was given back by DirSync.
I have tested the script on a small amount of data and it have worked fine. We thought, we make a bigger
stress-test on it, so we have used more than 90 000 files and about 9gigabyte data to see whats happening.
The first backup took place in about 53 minutes which is about fine. The second run (without big changes on
the original files) took place about 15 minutes which is also not bad, since the time includes the
hardlinking and comparing with originals as well. Then I have started to run it in a loop, and I have
realized, that the run times getting longer and longer after every run without any good reason.
Here is the log file my script is generated, so you can see how the run time increases:
2004-01-17+18:43 : BACKUPSYNC started Report on [sdi] /mnt/smb 2 updated /cd_new/software/internet/Mozilla/calendar_windows_1.5_20040109.xpi /mp3/Metallica - Black/10 - Track 10.mp3 0 deleted 2004-01-17+19:00 : BACKUPSYNC finished 2004-01-17+19:10 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-17+19:32 : BACKUPSYNC finished 2004-01-17+19:42 : BACKUPSYNC started Report on [sdi] /mnt/smb 1 updated /cd_new/software/ALL_WinFastPVR_40212.exe 0 deleted 2004-01-17+20:09 : BACKUPSYNC finished 2004-01-17+20:19 : BACKUPSYNC started Report on [sdi] /mnt/smb 6 updated /cd_new/Drivers/Video/LeadtekTVExpert /cd_new/Drivers/Video/LeadtekTVExpert/ALL_DRV_88X_31124.zip /cd_new/Drivers/Video/LeadtekTVExpert/ALL_DRV_88X_40212.zip /cd_new/Drivers/Video/LeadtekTVExpert/ALL_WinFastPVR_40212.exe /szemet/ajanlat.sxw /szemet/TL1000.avi 0 deleted 2004-01-17+20:49 : BACKUPSYNC finished 2004-01-17+20:59 : BACKUPSYNC started Report on [sdi] /mnt/smb 2 updated /szemet/ajanlat.sxw /szemet/TL1000.avi 0 deleted 2004-01-17+21:35 : BACKUPSYNC finished 2004-01-17+21:45 : BACKUPSYNC started Report on [sdi] /mnt/smb 2 updated /szemet/árvíztűrőtükörfúrógép.txt /szemet/toni_01.avi 0 deleted 2004-01-17+22:26 : BACKUPSYNC finished 2004-01-17+22:36 : BACKUPSYNC started Report on [sdi] /mnt/smb 2 updated /szemet/árvíztűrőtükörfúrógép.txt /szemet/toni_01.avi 0 deleted 2004-01-17+23:19 : BACKUPSYNC finished 2004-01-17+23:29 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+00:15 : BACKUPSYNC finished 2004-01-18+00:25 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+01:17 : BACKUPSYNC finished 2004-01-18+01:27 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+02:21 : BACKUPSYNC finished 2004-01-18+02:31 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+03:33 : BACKUPSYNC finished 2004-01-18+03:43 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+04:47 : BACKUPSYNC finished 2004-01-18+04:57 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+06:05 : BACKUPSYNC finished 2004-01-18+06:15 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+08:35 : BACKUPSYNC finished 2004-01-18+08:45 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+10:04 : BACKUPSYNC finished 2004-01-18+10:14 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+11:33 : BACKUPSYNC finished 2004-01-18+11:43 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+13:12 : BACKUPSYNC finished 2004-01-18+13:22 : BACKUPSYNC started Report on [sdi] /mnt/smb 0 updated 0 deleted 2004-01-18+14:49 : BACKUPSYNC finished 2004-01-18+14:59 : BACKUPSYNC started Report on [sdi] /mnt/smb 1 updated /uzlet/munkalistak/kontakt.sdc 0 deleted 2004-01-18+16:34 : BACKUPSYNC finished
The script is attached. Im not a pro in perl, so all advice appreciated. The second thing which came to my
attention, that even some files were not modified they become syncronized by DirSync (maybe
because of access or open). Please help me figure out whats the problem, which makes to script
to run longer and longer every time.
#!/usr/bin/perl -w use strict; use File::DirSync; use File::Path; use File::Find (); use diagnostics;
# so called Security $ENV{'PATH'} = '/bin:/usr/bin'; delete @ENV{qw(IFS CDPATH ENV BASH_ENV)}; # Make %ENV safer # Config ########### # Sources of Backup my %set; $set{"sdi"}{"srcdir"} = "/mnt/smb"; # These directories are ignored by backup # they are relative to srcdir # exmaple: if tmp/ is defined here, then it is # /mnt/smb/szemet/project/tmp $set{"sdi"}{"ign"} = [ "noroo", "Zotya", "tamas", "qnc", "tibu" , "mentesek" ]; # an example second backup set #$set{"etc"}{"srcdir"} = "/etc"; # If we do not have directory to ignore, than we define # an empty array #$set{"etc"}{"ign"} = []; # Basedir of destination for backups my $bdir = "/mnt/backup"; # Maximal backup slots my $maxslots=100; my $admin='[EMAIL PROTECTED]'; my $email=0; my $logfile="/var/log/backupsync.log"; # From here it is not needed to be modified ############################################ # DEBUG # general my $debug = 0; # ignore debug my $idebug = 0; # hardlink mirror debug my $hmdebug = 0; # DirSync Internal messages my $sdebug = 0; ############################################ # Script Return Code my $status = 0; # Directory Sync Variables my $ds; my @upd; my @del; my @ski; my @err; # CDATE # generates a directory nam from the actual date and time ############################################################# sub cdate() { my ($second, $minute, $hour, $day, $month, $year, $weekday, $dayofyear, $isdst) = localtime(time); $year=$year+1900; if ($day < 10) { $day = "0". $day; } if ($month < 10) { $month = "0" . $month; } if ($hour < 10) { $hour = "0" . $hour; } if ($minute < 10) { $minute = "0" . $minute; } # This is the format we will use for differential backup directory names # return "$year-$month-$day"; return "$year-$month-$day+$hour:$minute"; } # HARDLINKMIRROR # Populates a directory with hardlinks from tha last backup # to have a virtual copy we can syncronize with ############################################################ sub hardlinkmirror($$) { my $srcdir=shift; my $dstdir = shift; mkpath $dstdir; use vars qw/*name *dir *prune/; *name = *File::Find::name; *dir = *File::Find::dir; *prune = *File::Find::prune; print "HARDLINK : $srcdir -> $dstdir\n" if $hmdebug == 1; my $wanted = sub { /^.*\z/s; my $src = $File::Find::name; my $dst = $File::Find::name; $dst =~ s[^\Q$srcdir][$dstdir]g || die "$File::Find::name does not start with $srcdir\n"; if ( -f $src ) { # If it is a file we should link it link $src, $dst; print "Linked -> $dst\n" if $hmdebug == 1; push @del,$dst if $dstdir =~ /deleted/; } else { # If it is a directory we should create it # unless it is already exist (DIFF may have created it # if it was deleted ) if (!( -e $dst)) { mkpath $dst; print "Mkdir -> $dst\n" if $hmdebug == 1; push @del,$dst if $dstdir =~ /deleted/; } } print "SRC: $src\n" if $hmdebug == 1; }; # src directory find File::Find::find({wanted => \&$wanted}, $srcdir); } # SYNC # Syncronizes the given set with the destination, using DirSync ################################################################# sub sync($$$$) { my $id=shift; my $set=shift; my $srcdir=$set{$id}{srcdir}; my $dstdir=shift; my $ds=shift; # DirSync settings $ds = new File::DirSync { # Internal messages verbose => $sdebug, # If we do not have write access to SRC # then we can not build cache # so the value should be 1 then, but slows down # the syncronization nocache => 0, # If it is set to 1, it is not recursive localmode => 0, }; print "SYNC: $srcdir <-> $dstdir\n" if $debug == 1; $ds->src($srcdir); $ds->dst($dstdir); # If we do not want to backup a subdirectory of our set foreach my $ign ( @{$set{$id}{"ign"}} ) { print "IGNORE: $ign\n" if $idebug == 1; $ds->ignore($ign); } $ds->rebuild(); print "REBUILD DONE\n" if $debug == 1; $ds->dirsync(); @upd=$ds->entries_updated(); @del=$ds->entries_removed(); @ski=$ds->entries_skipped(); @err=$ds->entries_failed(); } # GETMM # Returns the oldest and the new backup directoris, furthermore the # number of backups already made #################################################################### sub getmm ($$) { my $dstdir=shift; my $list=shift; my @var; my $wanted = sub { -d _; my $dir=$File::Find::name; $dir =~ s[^\Q$dstdir][]g || die "$File::Find::name does not start with $dstdir\n"; my $depth = $dir =~ tr[/][]; if ( $depth == 1 ) { $dir =~ s[/][]g; if (length($dir) != 0 ) { my $value = $dir; $value =~ s[-|:|\+][]g; push @var,{'dir' => $dir, 'value' => $value }; } } }; File::Find::find({ wanted => \&$wanted }, $dstdir ) if -e $dstdir; if (@var) { my $min=0; my $max=0; foreach my $i (1..$#var) { $max=$i if $var[$i]{"value"} > $var[$max]{"value"}; $min=$i if $var[$i]{"value"} < $var[$min]{"value"}; } $list->{min}="$dstdir/$var[$min]{dir}"; $list->{max}="$dstdir/$var[$max]{dir}"; $list->[EMAIL PROTECTED]; } else { $list->{num}=0; } } # DIFF ####### sub diff ($$@) { # the dir we have created the sync on my $synced=shift; # last full backup dir on update (=$synced) # last-1 full backup dir on delete my $srcbase=shift; # differential destination dir my $dstbase=shift; # updated or deleted files my @[EMAIL PROTECTED]; mkpath $dstbase unless -e $dstbase; foreach my $src (@files) { my $dst = $src; $dst =~ s[^\Q$synced][$dstbase]g; # We have to replace the source base to the last-1 full # backup if the file was deleted $src =~ s[^\Q$synced][$srcbase]g if $synced !~ $srcbase; my ($dstdir) = $dst =~ m/(.*\/).*$/; mkpath $dstdir unless -e $dstdir; if ( -f $src ) { link $src,$dst; } else { mkpath ($dst) unless -e $dst; # we only need to hardlink files under directory # if it was deleted, otherwise (on update) the # files are returned in the @upd variable hardlinkmirror($src,$dst) if $synced !~ $srcbase; } } } # REPORT ################### sub report ($$) { my $id = shift; my $set = shift; my @log; push @log, "Report on [$id] ".$set{$id}{srcdir}."\n"; push @log, scalar @upd." db fájl frissült\n"; foreach my $updated (@upd) { $updated =~ s[^$bdir/\S+/$id/\d{4}-\d{2}-\d{2}\+\d{2}:\d{2}][]g; if ($updated) { push @log, "$updated\n"; } else { push @log, "/\n"; } } push @log, scalar @del." db fájl lett törölve\n"; foreach my $deleted (@del) { $deleted =~ s[^$bdir/\S+/$id/\d{4}-\d{2}-\d{2}\+\d{2}:\d{2}][]g; $deleted =~ s[/deleted][]g; push @log, "$deleted\n"; } if ( ( @ski > 0 ) && ( $debug == 1 ) ) { push @log, scalar @ski." db fájl nem módosult\n"; foreach my $skipped (@ski) { $skipped =~ s[^$bdir/\S+/$id/\d{4}-\d{2}-\d{2}\+\d{2}:\d{2}][]g; push @log, "$skipped\n"; } } if (@err > 0) { push @log,scalar @err." db fájl menentése közben HIBA LÉPETT FEL !\n"; foreach my $error (@err) { $error =~ s[^$bdir/\S+/$id/\d{4}-\d{2}-\d{2}\+\d{2}:\d{2}][]g; push @log, "$error\n"; $status=1; } } if ( $email == 1 ) { open(MAILIT,"|mail $admin -s 'BackupSync Report'"); print MAILIT @log; close MAILIT; } print LOG @log; } # MAIN ######## open(LOG,">>$logfile"); my $now = cdate(); print LOG "$now : BACKUPSYNC started\n"; foreach our $id (keys(%set)) { if ( -e $set{$id}{srcdir}) { mkpath "$bdir/act/$id" unless -e "$bdir/act/$id"; mkpath "$bdir/diff/$id" unless -e "$bdir/diff/$id"; getmm("$bdir/act/$id",\my %backupdir); my $dstdir = "$bdir/act/$id/$now"; if ( $backupdir{num} == 0 ) { # If we do not have backup at all yet sync($id,\%set,$dstdir,\$ds); report($id,\%set); } else { # If we already have at least one (full) backup if ( !( -e $dstdir ) ) { hardlinkmirror($backupdir{max},$dstdir); sync($id,\%set,$dstdir,\$ds); my $srcdir=$dstdir; if (@del > 0) { $dstdir = "$bdir/diff/$id/$now/deleted"; diff ($srcdir,$backupdir{max},$dstdir,@del); } if (@upd > 0) { $dstdir = "$bdir/diff/$id/$now/updated"; diff ($srcdir,$srcdir,$dstdir,@upd); } report($id,\%set); } else { # A backup has been already made in this minute print "We already have a differential backup in this minute: $now\n"; $status=1; } rmtree $backupdir{min} if $backupdir{num} > $maxslots; } } else { print "ERROR: the source: $set{$id}{srcdir} is not accessable !\n"; $status=1; } } $now = cdate(); print LOG "$now : BACKUPSYNC finished\n"; close LOG; # If an error has occured return a non-zero exit code exit(1) if ($status == 1);
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>