-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Is there a special reason why you don't use rsync or rsync over ssh as the communication method instead of NFS? You are being stuck with - --whole-file in this configuration not to mention the expense of doing a ton of stat() calls over the NFS.
Also, you can use lsof to see exactly what file or directory rsync has open. On 09/06/13 15:55, Andrew Martin wrote: > Hello, > > I'm using rsync 3.0.9 to backup several NFS shares from a > fileserver, mounted over NFSv3, to a local RAID on a backup server. > Both servers are running Ubuntu 12.04 server LTS. The fileserver's > filesystem is ext4. The NFS shares are mounted on the backup server > as follows: fileserver:/mnt/storage/share1 /mnt/share1 type nfs > (ro,tcp,bg,soft,intr,addr=192.168.1.1) > fileserver:/mnt/storage/share2 /mnt/share2 type nfs > (ro,tcp,bg,soft,intr,addr=192.168.1.1) > fileserver:/mnt/storage/share3 /mnt/share3 type nfs > (ro,tcp,bg,soft,intr,addr=192.168.1.1) > > These shares contain a large amount of files, including SVN > checkouts, extracted kernel trees, etc. I've run into a problem > where rsync will appear to hang or block indefinitely when backing > up one particular share, share3, but occasionally it will happen > with one of the other shares instead. A cron starts backing up > share3 nightly at 20:15. When this blocking problem does not occur, > the backup typically finishes around 20:45. However, when this > problem occurs, rsync blocks indefinitely. I have configured rsync > to run using the "timeout" command so that it will be killed if not > finished by 9:00 the next day: timeout -k 30s 764m rsync -av > --modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04 > --exclude .svn/ /mnt/share3/ /mnt/backups/share3/2013-09-05 The > exit code is 137, which I believe is 128 (from rsync) plus 9 sent > by timeout. > > Here are the child rsync processes, as you can see 1915 is in > uninterruptable sleep, but I believe that is normal: root 1914 > 0.0 0.0 10148 492 ? S Sep05 0:00 timeout -k 30s > 764m rsync -av --modify-window=2 > --link-dest=/mnt/backups/share3/2013-09-04 --exclude .svn/ > /mnt/share3/ /mnt/backups/share3/2013-09-05 root 1915 0.0 > 0.3 81240 27784 ? D Sep05 0:20 rsync -av > --modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04 > --exclude .svn/ /mnt/share3/ /mnt/backups/share3/2013-09-05 root > 1916 0.0 0.2 120028 19032 ? S Sep05 0:22 rsync -av > --modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04 > --exclude .svn/ /mnt/share3/ /mnt/backups/share3/2013-09-05 root > 1917 0.0 0.3 138272 26612 ? S Sep05 0:07 rsync -av > --modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04 > --exclude .svn/ /mnt/share3/ /mnt/backups/share3/2013-09-05 > > Running strace on the processes shows that the processes are not > actively doing anything: # strace -p 1914 Process 1914 attached - > interrupt to quit wait4(1915, > > # strace -p 1915 Process 1915 attached - interrupt to quit > > # strace -p 1916 Process 1916 attached - interrupt to quit > select(4, [3], [], NULL, {10, 731653}^C <unfinished ...> Process > 1916 detached > > # strace -p 1917 Process 1917 attached - interrupt to quit > select(1, [0], [], NULL, {27, 691627}^C <unfinished ...> Process > 1917 detached > > Based on the output in my rsync log file, I can see the last > directory that it copied a file from. I ran "time find > /path/to/that/dir -type f" on that directory and some other > directories on share3 and all of them returned quickly; I was not > able to make "find" block. The rsync crons that run for share1 and > share2 typically complete successfully, and they are also mounted > over NFS with the same mount options from the same fileserver. > > I do not see anything obviously related in dmesg on either the the > backup server or fileserver. Does anyone have an idea on what is > causing rsync to hang, or if there is a way to have it retry or > skip a file if there is a problem rather than blocking forever? The > --timeout option seems like it will abort the entire sync, but I > would like just skip over the bad section and continue with the > rest of the backup. Is this possible? > > Thanks, > > Andrew > - -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. ke...@futurequest.net (work) Orlando, Florida k...@sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlIqQOQACgkQVKC1jlbQAQcjwQCg1OhS8NciSJXolj6uND88O7R+ mLwAn0OPMGRfI/OrXjaNNBnz4RSUvS2U =6/1y -----END PGP SIGNATURE----- -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html