Hi, One of our clients has an rsync server hang which looks similar to bug 1442. The problem occurs only when they use our WAN acceleration box. So I'm thinking that the changes in TCP window size and scaling, etc. by the WAN acceleration box might be causing the rsync s/w to behave like this. I did not want to attach this to bug 1442 because I'm not sure whether it's relevant or not. They are using an older version of rsync, 2.6.4.
At the time of the hang, there seems to be some data in the rsync tcp connection queues tcp 60240 66080 lc-irv-1548:rsync lc-sina-167.sg.broadc:57018 ESTABLISHED and the process seems to be timing out on the select call. [pid 2515] open("daily/r2006_07_24/library/tsmc30/sc/v13/avanti/v3.0/ss/bcm30_13a/CEL/PW3_3LC:17", O_RDONLY) = 4 [pid 2515] fstat(4, {st_mode=S_IFREG|0755, st_size=1698, ...}) = 0 [pid 2515] read(4, "\0\0\0\0\0\0\0\214H\0\260\0\20\0\0\10\0\4A\1\10A\2\t\f"..., 1698) = 1698 [pid 2515] select(6, NULL, [5], NULL, {60, 0}) = 0 (Timeout) [pid 2515] select(6, NULL, [5], NULL, {60, 0}) = 0 (Timeout) [pid 2515] select(6, NULL, [5], NULL, {60, 0}) = 0 (Timeout) [pid 2515] select(6, NULL, [5], NULL, {60, 0}) = 0 (Timeout) [pid 2515] select(6, NULL, [5], NULL, {60, 0}) = 0 (Timeout) There were some comments attached to bug1442 which sounds like the problem our customer is experiencing (some descriptor not being included in the select call and select call timing out even when there is data to be read). I'm attaching last portion of the strace output and the netstat output. I'll very much appreciate any tips and/or advises that can help me in troubleshooting the issue. Thanks Kutluk
netstat-a
Description: netstat-a
rsync-server.out.partial
Description: rsync-server.out.partial
-- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html