Hello,

I'm using rsync 3.0.9 to backup several NFS shares from a fileserver, mounted 
over NFSv3, to a local RAID on a backup server. Both servers are running Ubuntu 
12.04 server LTS. The fileserver's filesystem is ext4. The NFS shares are 
mounted on the backup server as follows:
fileserver:/mnt/storage/share1 /mnt/share1 type nfs 
(ro,tcp,bg,soft,intr,addr=192.168.1.1)
fileserver:/mnt/storage/share2 /mnt/share2 type nfs 
(ro,tcp,bg,soft,intr,addr=192.168.1.1)
fileserver:/mnt/storage/share3 /mnt/share3 type nfs 
(ro,tcp,bg,soft,intr,addr=192.168.1.1)

These shares contain a large amount of files, including SVN checkouts, 
extracted kernel trees, etc. I've run into a problem where rsync will appear to 
hang or block indefinitely when backing up one particular share, share3, but 
occasionally it will happen with one of the other shares instead. A cron starts 
backing up share3 nightly at 20:15. When this blocking problem does not occur, 
the backup typically finishes around 20:45. However, when this problem occurs, 
rsync blocks indefinitely. I have configured rsync to run using the "timeout" 
command so that it will be killed if not finished by 9:00 the next day:
timeout -k 30s 764m rsync -av --modify-window=2 
--link-dest=/mnt/backups/share3/2013-09-04 --exclude .svn/ /mnt/share3/ 
/mnt/backups/share3/2013-09-05
The exit code is 137, which I believe is 128 (from rsync) plus 9 sent by 
timeout.

Here are the child rsync processes, as you can see 1915 is in uninterruptable 
sleep, but I believe that is normal:
root      1914  0.0  0.0  10148   492 ?        S    Sep05   0:00 timeout -k 30s 
764m rsync -av --modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04 
--exclude .svn/ /mnt/share3/ /mnt/backups/share3/2013-09-05
root      1915  0.0  0.3  81240 27784 ?        D    Sep05   0:20 rsync -av 
--modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04 --exclude .svn/ 
/mnt/share3/ /mnt/backups/share3/2013-09-05
root      1916  0.0  0.2 120028 19032 ?        S    Sep05   0:22 rsync -av 
--modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04 --exclude .svn/ 
/mnt/share3/ /mnt/backups/share3/2013-09-05
root      1917  0.0  0.3 138272 26612 ?        S    Sep05   0:07 rsync -av 
--modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04 --exclude .svn/ 
/mnt/share3/ /mnt/backups/share3/2013-09-05

Running strace on the processes shows that the processes are not actively doing 
anything:
# strace -p 1914
Process 1914 attached - interrupt to quit
wait4(1915,

# strace -p 1915
Process 1915 attached - interrupt to quit

# strace -p 1916
Process 1916 attached - interrupt to quit
select(4, [3], [], NULL, {10, 731653}^C <unfinished ...>
Process 1916 detached

# strace -p 1917
Process 1917 attached - interrupt to quit
select(1, [0], [], NULL, {27, 691627}^C <unfinished ...>
Process 1917 detached

Based on the output in my rsync log file, I can see the last directory that it 
copied a file from. I ran "time find /path/to/that/dir -type f" on that 
directory and some other directories on share3 and all of them returned 
quickly; I was not able to make "find" block. The rsync crons that run for 
share1 and share2 typically complete successfully, and they are also mounted 
over NFS with the same mount options from the same fileserver.

I do not see anything obviously related in dmesg on either the the backup 
server or fileserver. Does anyone have an idea on what is causing rsync to 
hang, or if there is a way to have it retry or skip a file if there is a 
problem rather than blocking forever? The --timeout option seems like it will 
abort the entire sync, but I would like just skip over the bad section and 
continue with the rest of the backup. Is this possible?

Thanks,

Andrew
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to