On Fri, 2008-12-05 at 11:42 -0500, Shawn Geraghty wrote: > Upgrading will be done, but not really an option at this point, unless > of course, that is the fix. I'm a limited resource and other stuff is > backing up.
I understand, but if the strace does indicate that you're hitting a hang bug in rsync 2.6.9, I don't know that anyone will care to investigate the bug unless it can be reproduced with the latest stable rsync. > Here is an strace from server A. > > > 11242 stat64("/etc/localtime", {st_mode=S_IFREG|0644, > st_size=3519, ...}) = 0 > 11242 write(3, "2008/12/05 08:04:55 [11242] recv"..., 92) = 92 > 11242 write(1, "recv_file_name(usr/share/icons/L"..., 64) = 64 > 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 11242 read(6, "9\0\0\t", 4) = 4 > 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 11242 read(6, "make_file(1,usr/share/icons/Loco"..., 57) = 57 > 11242 time(NULL) = 1228482295 > 11242 stat64("/etc/localtime", {st_mode=S_IFREG|0644, > st_size=3519, ...}) = 0 > 11242 write(3, "2008/12/05 08:04:55 [11242] make"..., 85) = 85 > 11242 write(1, "make_file(1,usr/share/icons/Loco"..., 57) = 57 > 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 11242 read(6, "\f\0\0\7", 4) = 4 > 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 11242 read(6, "\276&\5p.pngt\1\0\0", 12) = 12 > 11242 time(NULL) = 1228482295 > 11242 stat64("/etc/localtime", {st_mode=S_IFREG|0644, > st_size=3519, ...}) = 0 > 11242 write(3, "2008/12/05 08:04:55 [11242] recv"..., 88) = 88 > 11242 write(1, "recv_file_name(usr/share/icons/L"..., 60) = 60 > 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 11242 read(6, ":\0\0\t", 4) = 4 > 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 11242 read(6, "make_file(1,usr/share/icons/Loco"..., 58) = 36 > 11242 select(7, [6], [], NULL, {60, 0}) = 0 (Timeout) > 11242 select(7, [6], [], NULL, {60, 0}) = 0 (Timeout) > 11242 select(7, [6], [], NULL, {60, 0}) = 0 (Timeout) > [...] So the receiver was waiting for the sender to send it the rest of the (non-incremental) file list. That's what it's supposed to do, so I need to know what the sender was doing to diagnose the hang. > The server being backed up doesn't run rsync as a daemon. I've read > in the rsync docs that backups can be done initiated from the server > with out the client running rsync as a daemon. For this case, I don't > think I can get an strace. But, if possible, let me know and I will > track down how to do that. You can, and you'll need to if we are to get to the root of the problem. See the "rsync-debug" script described at: http://rsync.samba.org/issues.html > I also tried to back-up another server, and captured an strace from > Server A. Here is that trace. > > The big difference here was that I had to ^C the process after about > an hour. Usually, it looks like ssh shuts down the tunnel after a > period of 15 minutes. > > 4872 <... select resumed> ) = 2 (in [4], out [6]) > 4891 select(7, [6], [], NULL, {60, 0} <unfinished ...> > 4872 write(6, "\374\17\0\7glade2\0\f.xpm\0\347images\0\6/yelp\0"..., > 16384 <unfinished ...> > 4891 <... select resumed> ) = 1 (in [6], left {60, 0}) > 4872 <... write resumed> ) = 16384 > 4891 read(6, <unfinished ...> > 4872 read(4, <unfinished ...> > 4891 <... read resumed> "urrent.desktop\0\5win95.desktop\0\3756"..., > 4092) = 4092 > 4872 <... read resumed> "\243\36c\274aVv\324\16\307z > \347\17\274\247\34\331\327\333X\256\177\222\360\246\"\v\301\"pA%"..., > 8192) = 8192 > 4891 select(7, [6], [], NULL, {60, 0} <unfinished ...> > 4872 select(8, [4 5], [], NULL, NULL <unfinished ...> > 4891 <... select resumed> ) = 1 (in [6], left {60, 0}) > 4872 <... select resumed> ) = 1 (in [4]) > 4891 read(6, <unfinished ...> > 4872 read(4, <unfinished ...> > 4891 <... read resumed> "\374\17\0\7", 4) = 4 > 4872 <... read resumed> "\270\335\204#\373QM\217U\225K > \241\22\2750\357\256\2504\215\206L\255M\264\364\362\267\366D > \335\350"..., 8192) = 3040 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "t-preferences.glade2\0\1asciitable"..., 4092) = 4092 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "\374\17\0\7", 4) = 4 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4872 select(8, [4 5], [6], NULL, NULL <unfinished ...> > 4891 read(6, <unfinished ...> > 4872 <... select resumed> ) = 1 (out [6]) > 4891 <... read resumed> "glade2\0\f.xpm\0\347images\0\6/yelp\0\5/ > em"..., 4092) = 4092 > 4872 write(6, "\374\17\0\7ORWARDING_DENIED\0\1TP_DISABLE"..., 16384 > <unfinished ...> > 4891 select(7, [6], [], NULL, {60, 0} <unfinished ...> > 4872 <... write resumed> ) = 16384 > 4891 <... select resumed> ) = 1 (in [6], left {60, 0}) > 4872 select(8, [4 5], [], NULL, NULL <unfinished ...> > 4891 read(6, "\374\17\0\7", 4) = 4 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "t-config-nfs.png\0\371test.py\0\0mainW"..., 4092) = 4092 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "\374\17\0\7", 4) = 4 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "RWARDING_DENIED\0\1TP_DISABLED\0\3FA"..., 4092) = 4092 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "\374\17\0\7", 4) = 4 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "NG_DENIED\0\1TP_DISABLED\0\3FAILURE\0"..., 4092) = 4092 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "\374\17\0\7", 4) = 4 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "ORWARDING_DENIED\0\1TP_DISABLED\0\3F"..., 4092) = 4092 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "\374\17\0\7", 4) = 4 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "elogs\0\377suexec\0\0certwatch\0\0aepdae"..., 4092) = > 4092 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "\374\17\0\7", 4) = 4 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "slinux\0\0talk\0\1elnet\0\0ime\0\377wget\0\0"..., 4092) > = 4092 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "\374\17\0\7", 4) = 4 > 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0}) > 4891 read(6, "changethehostname\0\0einit_wrapper"..., 4092) = 4092 > 4891 select(7, [6], [], NULL, {60, 0} <unfinished ...> > 4871 <... select resumed> ) = 0 (Timeout) > 4871 select(5, [4], [], NULL, {60, 0} <unfinished ...> > 4891 <... select resumed> ) = 0 (Timeout) > 4891 select(7, [6], [], NULL, {60, 0} <unfinished ...> > 4871 <... select resumed> ) = 0 (Timeout) > 4871 select(5, [4], [], NULL, {60, 0} <unfinished ...> > 4891 <... select resumed> ) = 0 (Timeout) > 4891 select(7, [6], [], NULL, {60, 0} <unfinished ...> > 4871 <... select resumed> ) = 0 (Timeout) > > . > . > same message another 100 times Here it looks like a file transfer was in progress and the receiver (4891) was waiting for more file data from the sender. Again, I need to see what the sender was doing. Matt -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html