Hi,

One of our clients has an rsync server hang which looks similar to bug 1442. 
The problem occurs only when they use our WAN acceleration box. So I'm thinking 
that the changes in TCP window size and scaling, etc. by the WAN acceleration 
box might be causing the rsync s/w to behave like this. I did not want to 
attach this to bug 1442 because I'm not sure whether it's relevant or not. They 
are using an older version of rsync, 2.6.4.

At the time of the hang, there seems to be some data in the rsync tcp 
connection queues

tcp    60240  66080 lc-irv-1548:rsync           lc-sina-167.sg.broadc:57018 
ESTABLISHED 

and the process seems to be timing out on the select call.

[pid  2515] 
open("daily/r2006_07_24/library/tsmc30/sc/v13/avanti/v3.0/ss/bcm30_13a/CEL/PW3_3LC:17",
 O_RDONLY) = 4
[pid  2515] fstat(4, {st_mode=S_IFREG|0755, st_size=1698, ...}) = 0
[pid  2515] read(4, 
"\0\0\0\0\0\0\0\214H\0\260\0\20\0\0\10\0\4A\1\10A\2\t\f"..., 1698) = 1698
[pid  2515] select(6, NULL, [5], NULL, {60, 0}) = 0 (Timeout)
[pid  2515] select(6, NULL, [5], NULL, {60, 0}) = 0 (Timeout)
[pid  2515] select(6, NULL, [5], NULL, {60, 0}) = 0 (Timeout)
[pid  2515] select(6, NULL, [5], NULL, {60, 0}) = 0 (Timeout)
[pid  2515] select(6, NULL, [5], NULL, {60, 0}) = 0 (Timeout)

There were some comments attached to bug1442 which sounds like the problem our 
customer is experiencing (some descriptor not being included in the select call 
and select call timing out even when there is data to be read). I'm attaching 
last portion of the strace output and the netstat output.

I'll very much appreciate any tips and/or advises that can help me in 
troubleshooting the issue.

Thanks

Kutluk












Attachment: netstat-a
Description: netstat-a

Attachment: rsync-server.out.partial
Description: rsync-server.out.partial

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to