Package: rsnapshot Version: 1.3.1-3 Severity: important Tags: patch Hi.
When 'lazy_delete' option is enabled, rsnapshot deletes lock file twice: first before starting `rm -rf`, and second time, when `rm -rf` finishes. But second time the lockfile may not belong to this rsnapshot process. Here is a test. I'll use wrappers around `rm` and `rsync` to insert a delay of 30 second for their run: # cat rm_wait #!/bin/sh set -euf sleep 30; /bin/rm "$@" and # cat rsync_wait #!/bin/sh set -euf sleep 30 /usr/bin/rsync "$@" then i limit 'hourly' backups to 2 copies (see entire config at the end): # ls -al storage/ total 16 drwx------ 4 root root 4096 Jul 16 14:16 . drwx------ 4 root root 4096 Jul 16 14:16 .. drwx------ 3 root root 4096 Jul 16 14:16 hourly.0 drwx------ 3 root root 4096 Jul 16 14:15 hourly.1 and start two rsnapshot instances from two terminals with some delay: # rsnapshot -c ./rsnapshot.conf hourly ; ls -l rsnapshot.pid ls: cannot access rsnapshot.pid: No such file or directory (note, that there is no lock file after 'hourly' finishes, but it should be, because this is 'sync's lockfile) # rsnapshot -c ./rsnapshot.conf sync # first i start rsnapshot hourly: [16/Jul/2014:14:21:21] /usr/bin/rsnapshot -c ./rsnapshot.conf hourly: started [16/Jul/2014:14:21:21] Setting locale to POSIX "C" [16/Jul/2014:14:21:21] echo 2255 > /root/test_lazy_delete/rsnapshot.pid [16/Jul/2014:14:21:21] mv /root/test_lazy_delete/storage/hourly.1/ /root/test_lazy_delete/storage/_delete.2255/ [16/Jul/2014:14:21:21] mv /root/test_lazy_delete/storage/hourly.0/ /root/test_lazy_delete/storage/hourly.1/ [16/Jul/2014:14:21:21] rm -f /root/test_lazy_delete/rsnapshot.pid [16/Jul/2014:14:21:21] /root/test_lazy_delete/rm_wait -rf /root/test_lazy_delete/storage/_delete.2255 and then, when hourly deletes its lockfile and starts `rm -rf`, i start rsnapshot sync: [16/Jul/2014:14:21:28] /usr/bin/rsnapshot -c ./rsnapshot.conf sync: started [16/Jul/2014:14:21:28] Setting locale to POSIX "C" [16/Jul/2014:14:21:28] echo 2262 > /root/test_lazy_delete/rsnapshot.pid [16/Jul/2014:14:21:28] mkdir -m 0755 -p /root/test_lazy_delete/storage/.sync/ [16/Jul/2014:14:21:28] /root/test_lazy_delete/rsync_wait -a --delete --numeric-ids --relative --delete-excluded --link-dest=/root/test_lazy_delete/storage/hourly.1/localhost/ /root/test_lazy_delete/./data /root/test_lazy_delete/storage/.sync/localhost/ the 'hourly' instance finishes first and deletes (again!) lockfile, which now belongs to 'sync' instance: [16/Jul/2014:14:21:51] rm -f /root/test_lazy_delete/rsnapshot.pid [16/Jul/2014:14:21:51] /usr/bin/rsnapshot -c ./rsnapshot.conf hourly: completed successfully then 'sync' instance finishes: [16/Jul/2014:14:21:58] rsync succeeded [16/Jul/2014:14:21:58] touch /root/test_lazy_delete/storage/.sync/ [16/Jul/2014:14:21:58] No directory to delete: /root/test_lazy_delete/storage/_delete.2262 [16/Jul/2014:14:21:58] No need to remove non-existent lock /root/test_lazy_delete/rsnapshot.pid [16/Jul/2014:14:21:58] /usr/bin/rsnapshot -c ./rsnapshot.conf sync: completed successfully The problem is because rsnapshot calls remove_lockfile() after handle_interval() in any case (even if it has already deleted lockfile due to 'lazy_delete'). So, here is the patch, which adds check before removing lockfile after handle_interval(), that lockfile really belongs to this rsnapshot process (i don't know perl, even syntax, so i just copy-pasted different chunks of rsnapshot code): --- /usr/bin/rsnapshot 2011-07-09 18:39:45.000000000 +0400 +++ ./rsnapshot.patch 2014-07-16 14:45:31.334504669 +0400 @@ -287,7 +287,7 @@ handle_interval( $cmd ); # if we have a lockfile, remove it -remove_lockfile(); +remove_lockfile_if_owner(); # if we got this far, the program is done running # write to the log and syslog with the status of the outcome @@ -2350,6 +2350,37 @@ return (1); } +# Calls remove_lockfile(), if there is ours lockfile. Otherwise, do nothing +# and return 1 (as remove_lockfile() does). Also, it can exit the program +# with 1 if it can't read lockfile. +sub remove_lockfile_if_owner { + if (!defined($config_vars{'lockfile'})) { + return (undef); + } + + my $lockfile = $config_vars{'lockfile'}; + my $result = undef; + + print_msg ("[$$]: Removing lock file, if we owns it.", 3); + if ( -e "$lockfile" ) { + if(!open(LOCKFILE, $lockfile)) { + print_err ("[$$]: Can't read $lockfile, will not delete it!", 1); + syslog_err("[$$]: Can't read $lockfile, will not delete it"); + exit(1); + } + my $pid = <LOCKFILE>; + chomp($pid); + close(LOCKFILE); + if ($$ != $pid) { + print_warn ("[$$]: Lockfile $lockfile belongs to other process $pid, will not delete it!", 2); + syslog_warn("[$$]: Lockfile $lockfile belongs to other process $pid, will not delete it"); + return(1); + } + } + remove_lockfile(); + return (1); +} + # accepts no arguments # accepts the path to a lockfile and tries to remove it # returns undef if lockfile isn't defined in the config file, and 1 upon success And now the test again: # ls -la storage/ total 16 drwx------ 4 root root 4096 Jul 16 14:48 . drwx------ 4 root root 4096 Jul 16 14:48 .. drwx------ 3 root root 4096 Jul 16 14:47 hourly.0 drwx------ 3 root root 4096 Jul 16 14:46 hourly.1 Start two ./rsnapshot.patch instances: # ./rsnapshot.patch -c ./rsnapshot.conf hourly ; ls -l rsnapshot.pid WARNING: [2976]: Lockfile /root/test_lazy_delete/rsnapshot.pid belongs to other process 2979, will not delete it! -rw------- 1 root root 4 Jul 16 14:48 rsnapshot.pid (note, lockfile still exist!) # ./rsnapshot.patch -c ./rsnapshot.conf sync # And here is log. 'hourly' starts and goes up to `rm -rf`: [16/Jul/2014:14:48:22] ./rsnapshot.patch -c ./rsnapshot.conf hourly: started [16/Jul/2014:14:48:22] Setting locale to POSIX "C" [16/Jul/2014:14:48:22] echo 2976 > /root/test_lazy_delete/rsnapshot.pid [16/Jul/2014:14:48:22] mv /root/test_lazy_delete/storage/hourly.1/ /root/test_lazy_delete/storage/_delete.2976/ [16/Jul/2014:14:48:22] mv /root/test_lazy_delete/storage/hourly.0/ /root/test_lazy_delete/storage/hourly.1/ [16/Jul/2014:14:48:22] rm -f /root/test_lazy_delete/rsnapshot.pid [16/Jul/2014:14:48:22] /root/test_lazy_delete/rm_wait -rf /root/test_lazy_delete/storage/_delete.2976 'sync' starts: [16/Jul/2014:14:48:26] ./rsnapshot.patch -c ./rsnapshot.conf sync: started [16/Jul/2014:14:48:26] Setting locale to POSIX "C" [16/Jul/2014:14:48:26] echo 2979 > /root/test_lazy_delete/rsnapshot.pid [16/Jul/2014:14:48:26] mkdir -m 0755 -p /root/test_lazy_delete/storage/.sync/ [16/Jul/2014:14:48:26] /root/test_lazy_delete/rsync_wait -a --delete --numeric-ids --relative --delete-excluded --link-dest=/root/test_lazy_delete/storage/hourly.1/localhost/ /root/test_lazy_delete/./data /root/test_lazy_delete/storage/.sync/localhost/ 'hourly' finishes lazy delete, but notices, that lockfile no longer belongs to it, and leaves it in place: [16/Jul/2014:14:48:52] [2976]: Removing lock file, if we owns it. [16/Jul/2014:14:48:52] [2976]: Lockfile /root/test_lazy_delete/rsnapshot.pid belongs to other process 2979, will not delete it! [16/Jul/2014:14:48:52] WARNING: ./rsnapshot.patch -c ./rsnapshot.conf hourly: completed, but with some warnings 'sync' finishes and deletes its lockfile: [16/Jul/2014:14:48:56] rsync succeeded [16/Jul/2014:14:48:56] touch /root/test_lazy_delete/storage/.sync/ [16/Jul/2014:14:48:56] No directory to delete: /root/test_lazy_delete/storage/_delete.2979 [16/Jul/2014:14:48:56] [2979]: Removing lock file, if we owns it. [16/Jul/2014:14:48:56] rm -f /root/test_lazy_delete/rsnapshot.pid [16/Jul/2014:14:48:56] ./rsnapshot.patch -c ./rsnapshot.conf sync: completed successfully -- Dmitriy Matrosov Test config was: ################################################# # rsnapshot.conf - rsnapshot configuration file # ################################################# # # # PLEASE BE AWARE OF THE FOLLOWING RULES: # # # # This file requires tabs between elements # # # # Directories require a trailing slash: # # right: /home/ # # wrong: /home # # # ################################################# ####################### # CONFIG FILE VERSION # ####################### config_version 1.2 ########################### # SNAPSHOT ROOT DIRECTORY # ########################### # All snapshots will be stored under this root directory. # snapshot_root /root/test_lazy_delete/storage # If no_create_root is enabled, rsnapshot will not automatically create the # snapshot_root directory. This is particularly useful if you are backing # up to removable media, such as a FireWire or USB drive. # no_create_root 1 ################################# # EXTERNAL PROGRAM DEPENDENCIES # ################################# # LINUX USERS: Be sure to uncomment "cmd_cp". This gives you extra features. # EVERYONE ELSE: Leave "cmd_cp" commented out for compatibility. # # See the README file or the man page for more details. # cmd_cp /bin/cp # uncomment this to use the rm program instead of the built-in perl routine. # cmd_rm /root/test_lazy_delete/rm_wait # rsync must be enabled for anything to work. This is the only command that # must be enabled. # cmd_rsync /root/test_lazy_delete/rsync_wait # Uncomment this to enable remote ssh backups over rsync. # cmd_ssh /usr/bin/ssh # Comment this out to disable syslog support. # cmd_logger /usr/bin/logger # Uncomment this to specify the path to "du" for disk usage checks. # If you have an older version of "du", you may also want to check the # "du_args" parameter below. # cmd_du /usr/bin/du # Uncomment this to specify the path to rsnapshot-diff. # #cmd_rsnapshot_diff /usr/bin/rsnapshot-diff # Specify the path to a script (and any optional arguments) to run right # before rsnapshot syncs files # #cmd_preexec /path/to/preexec/script # Specify the path to a script (and any optional arguments) to run right # after rsnapshot syncs files # #cmd_postexec /path/to/postexec/script # Paths to lvcreate, lvremove, mount and umount commands, for use with # Linux LVMs. # #linux_lvm_cmd_lvcreate /path/to/lvcreate #linux_lvm_cmd_lvremove /path/to/lvremove #linux_lvm_cmd_mount /bin/mount #linux_lvm_cmd_umount /bin/umount ######################################### # BACKUP INTERVALS # # Must be unique and in ascending order # # i.e. hourly, daily, weekly, etc. # ######################################### retain hourly 2 #retain daily 14 #retain weekly 4 #retain monthly 3 ############################################ # GLOBAL OPTIONS # # All are optional, with sensible defaults # ############################################ # Verbose level, 1 through 5. # 1 Quiet Print fatal errors only # 2 Default Print errors and warnings only # 3 Verbose Show equivalent shell commands being executed # 4 Extra Verbose Show extra verbose information # 5 Debug mode Everything # verbose 2 # Same as "verbose" above, but controls the amount of data sent to the # logfile, if one is being used. The default is 3. # loglevel 5 # If you enable this, data will be written to the file you specify. The # amount of data written is controlled by the "loglevel" parameter. # logfile /root/test_lazy_delete/rsnapshot.log # If enabled, rsnapshot will write a lockfile to prevent two instances # from running simultaneously (and messing up the snapshot_root). # If you enable this, make sure the lockfile directory is not world # writable. Otherwise anyone can prevent the program from running. # lockfile /root/test_lazy_delete/rsnapshot.pid # By default, rsnapshot check lockfile, check if PID is running # and if not, consider lockfile as stale, then start # Enabling this stop rsnapshot if PID in lockfile is not running # #stop_on_stale_lockfile 0 # Default rsync args. All rsync commands have at least these options set. # #rsync_short_args -a #rsync_long_args --delete --numeric-ids --relative --delete-excluded # ssh has no args passed by default, but you can specify some here. # #ssh_args -p 22 # Default arguments for the "du" program (for disk space reporting). # The GNU version of "du" is preferred. See the man page for more details. # If your version of "du" doesn't support the -h flag, try -k flag instead. # #du_args -csh # If this is enabled, rsync won't span filesystem partitions within a # backup point. This essentially passes the -x option to rsync. # The default is 0 (off). # #one_fs 0 # The include and exclude parameters, if enabled, simply get passed directly # to rsync. If you have multiple include/exclude patterns, put each one on a # separate line. Please look up the --include and --exclude options in the # rsync man page for more details on how to specify file name patterns. # #include ??? #include ??? #exclude ??? #exclude ??? # The include_file and exclude_file parameters, if enabled, simply get # passed directly to rsync. Please look up the --include-from and # --exclude-from options in the rsync man page for more details. # #include_file /path/to/include/file #exclude_file /path/to/exclude/file # If your version of rsync supports --link-dest, consider enable this. # This is the best way to support special files (FIFOs, etc) cross-platform. # The default is 0 (off). # link_dest 1 # When sync_first is enabled, it changes the default behaviour of rsnapshot. # Normally, when rsnapshot is called with its lowest interval # (i.e.: "rsnapshot hourly"), it will sync files AND rotate the lowest # intervals. With sync_first enabled, "rsnapshot sync" handles the file sync, # and all interval calls simply rotate files. See the man page for more # details. The default is 0 (off). # sync_first 1 # If enabled, rsnapshot will move the oldest directory for each interval # to [interval_name].delete, then it will remove the lockfile and delete # that directory just before it exits. The default is 0 (off). # use_lazy_deletes 1 # Number of rsync re-tries. If you experience any network problems or # network card issues that tend to cause ssh to crap-out with # "Corrupted MAC on input" errors, for example, set this to a non-zero # value to have the rsync operation re-tried # #rsync_numtries 0 # LVM parameters. Used to backup with creating lvm snapshot before backup # and removing it after. This should ensure consistency of data in some special # cases # # LVM snapshot(s) size (lvcreate --size option). # #linux_lvm_snapshotsize 100M # Name to be used when creating the LVM logical volume snapshot(s). # #linux_lvm_snapshotname rsnapshot # Path to the LVM Volume Groups. # #linux_lvm_vgpath /dev # Mount point to use to temporarily mount the snapshot(s). # #linux_lvm_mountpath /path/to/mount/lvm/snapshot/during/backup ############################### ### BACKUP POINTS / SCRIPTS ### ############################### # LOCALHOST backup /root/test_lazy_delete/./data/ localhost/ -- System Information: Debian Release: 7.6 APT prefers stable-updates APT policy: (990, 'stable-updates'), (990, 'stable') Architecture: i386 (i686) Kernel: Linux 3.2.0-4-686-pae (SMP w/2 CPU cores) Locale: LANG=en_US.utf8, LC_CTYPE=ru_RU.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages rsnapshot depends on: ii liblchown-perl 1.01-1+b2 ii logrotate 3.8.1-4 ii perl 5.14.2-21+deb7u1 ii rsync 3.0.9-4 Versions of packages rsnapshot recommends: ii openssh-client [ssh-client] 1:6.0p1-4+deb7u2 rsnapshot suggests no packages. -- Configuration Files: /etc/cron.d/rsnapshot changed [not included] /etc/rsnapshot.conf changed [not included] -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org