Hello, I'm using rsync 3.0.9 to backup several NFS shares from a fileserver, mounted over NFSv3, to a local RAID on a backup server. Both servers are running Ubuntu 12.04 server LTS. The fileserver's filesystem is ext4. The NFS shares are mounted on the backup server as follows: fileserver:/mnt/storage/share1 /mnt/share1 type nfs (ro,tcp,bg,soft,intr,addr=192.168.1.1) fileserver:/mnt/storage/share2 /mnt/share2 type nfs (ro,tcp,bg,soft,intr,addr=192.168.1.1) fileserver:/mnt/storage/share3 /mnt/share3 type nfs (ro,tcp,bg,soft,intr,addr=192.168.1.1)
These shares contain a large amount of files, including SVN checkouts, extracted kernel trees, etc. I've run into a problem where rsync will appear to hang or block indefinitely when backing up one particular share, share3, but occasionally it will happen with one of the other shares instead. A cron starts backing up share3 nightly at 20:15. When this blocking problem does not occur, the backup typically finishes around 20:45. However, when this problem occurs, rsync blocks indefinitely. I have configured rsync to run using the "timeout" command so that it will be killed if not finished by 9:00 the next day: timeout -k 30s 764m rsync -av --modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04 --exclude .svn/ /mnt/share3/ /mnt/backups/share3/2013-09-05 The exit code is 137, which I believe is 128 (from rsync) plus 9 sent by timeout. Here are the child rsync processes, as you can see 1915 is in uninterruptable sleep, but I believe that is normal: root 1914 0.0 0.0 10148 492 ? S Sep05 0:00 timeout -k 30s 764m rsync -av --modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04 --exclude .svn/ /mnt/share3/ /mnt/backups/share3/2013-09-05 root 1915 0.0 0.3 81240 27784 ? D Sep05 0:20 rsync -av --modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04 --exclude .svn/ /mnt/share3/ /mnt/backups/share3/2013-09-05 root 1916 0.0 0.2 120028 19032 ? S Sep05 0:22 rsync -av --modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04 --exclude .svn/ /mnt/share3/ /mnt/backups/share3/2013-09-05 root 1917 0.0 0.3 138272 26612 ? S Sep05 0:07 rsync -av --modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04 --exclude .svn/ /mnt/share3/ /mnt/backups/share3/2013-09-05 Running strace on the processes shows that the processes are not actively doing anything: # strace -p 1914 Process 1914 attached - interrupt to quit wait4(1915, # strace -p 1915 Process 1915 attached - interrupt to quit # strace -p 1916 Process 1916 attached - interrupt to quit select(4, [3], [], NULL, {10, 731653}^C <unfinished ...> Process 1916 detached # strace -p 1917 Process 1917 attached - interrupt to quit select(1, [0], [], NULL, {27, 691627}^C <unfinished ...> Process 1917 detached Based on the output in my rsync log file, I can see the last directory that it copied a file from. I ran "time find /path/to/that/dir -type f" on that directory and some other directories on share3 and all of them returned quickly; I was not able to make "find" block. The rsync crons that run for share1 and share2 typically complete successfully, and they are also mounted over NFS with the same mount options from the same fileserver. I do not see anything obviously related in dmesg on either the the backup server or fileserver. Does anyone have an idea on what is causing rsync to hang, or if there is a way to have it retry or skip a file if there is a problem rather than blocking forever? The --timeout option seems like it will abort the entire sync, but I would like just skip over the bad section and continue with the rest of the backup. Is this possible? Thanks, Andrew -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html