(I sent this to the bug address several weeks ago, so I'm resending this
to the list)
I'm getting numerous rsync (v2.4.6) problems under Linux 2.4.2 (RedHat
7.1) or stock 2.4.4 on several machines. rsync often hangs copying files
from NFS or local disks to local disks. Strangely the problem is fixed by
stracing one of the three rsync threads!
I've encountered the problem just rsyncing the standard linux 2.4.4 kernel
source tree to a new (blank) directory.
rsync -raxv /data/jss/sysadmin/linux-2.4.4/linux .
The problem is repeatable with this source tree on several machines (one
PII machine and an Athlon). The problem also exists copying the stock
Linux 2.4.5 source tree (download it to reproduce the problem). It hangs
on linux/scripts/ver_linux in that case.
For example:
xpc6:~> rsync -raxv /data/jss/sysadmin/linux-2.4.4/linux /tmp/kernel/
[....]
linux/scripts/tkparse.c
linux/scripts/tkparse.h
linux/scripts/ver_linux
linux/vmlinux
[hangs here, for at least several hours]
(switch to another window)
xpc6:~> ps auxw|grep rsync
jss 3165 10.9 1.7 3144 2272 pts/0 S 14:20 0:19 rsync -raxv
/data/jss/sysadmin/linux-2.4.4/linux .
jss 3166 1.1 1.7 3128 2216 pts/0 S 14:20 0:02 rsync -raxv
/data/jss/sysadmin/linux-2.4.4/linux .
jss 3167 10.4 1.7 3136 2236 pts/0 S 14:20 0:18 rsync -raxv
/data/jss/sysadmin/linux-2.4.4/linux .
xpc6:~> su
[blah]
[root@xpc6 jss]# strace -p 3165
select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
gettimeofday({992352238, 401281}, NULL) = 0
wait4(3166, 0xbfffdd80, WNOHANG, NULL) = 0
gettimeofday({992352238, 401846}, NULL) = 0
gettimeofday({992352238, 402088}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 20000}) = 0 (Timeout)
gettimeofday({992352238, 420838}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
gettimeofday({992352238, 431066}, NULL) = 0
wait4(3166, 0xbfffdd80, WNOHANG, NULL) = 0
gettimeofday({992352238, 431568}, NULL) = 0
gettimeofday({992352238, 431809}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 20000}) = 0 (Timeout)
[lots more of these]
[root@xpc6 jss]# strace -p 3166
[program starts working again]
select(2, NULL, [1], NULL, {17, 860000}) = 1 (out [1], left {17, 830000})
write(1, "\27\0\0\tlinux/arch/ia64/sn/io/\n", 27) = 27
select(6, [3 5], NULL, NULL, {60, 0}) = 1 (in [5], left {60, 0})
select(6, [5], NULL, NULL, {60, 0}) = 1 (in [5], left {60, 0})
read(5, "\30\0\0\t", 4) = 4
select(6, [5], NULL, NULL, {60, 0}) = 1 (in [5], left {60, 0})
read(5, "linux/arch/ia64/sn/sn1/\n", 24) = 24
select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left {60, 0})
write(1, "\30\0\0\tlinux/arch/ia64/sn/sn1/\n", 28) = 28
select(6, [3 5], NULL, NULL, {60, 0}) = 1 (in [5], left {60, 0})
select(6, [5], NULL, NULL, {60, 0}) = 1 (in [5], left {60, 0})
read(5, "\32\0\0\t", 4) = 4
select(6, [5], NULL, NULL, {60, 0}) = 1 (in [5], left {60, 0})
read(5, "linux/arch/ia64/sn/tools/\n", 26) = 26
select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left {60, 0})
write(1, "\32\0\0\tlinux/arch/ia64/sn/tools/\n", 30) = 30
select(6, [3 5], NULL, NULL, {60, 0}) = 1 (in [5], left {60, 0})
select(6, [5], NULL, NULL, {60, 0}) = 1 (in [5], left {60, 0})
read(5, "\27\0\0\t", 4) = 4
select(6, [5], NULL, NULL, {60, 0}) = 1 (in [5], left {60, 0})
[lots more]
[program finishes]
Has anyone else encountered this problem? Is it a kernel problem or an
rsync problem?
This is with the RedHat version of rsync, but I downloaded the latest CVS
version and it is there. I applied the patch in
http://lists.samba.org/pipermail/rsync/2001-June/004370.html
too.
Is there a different utility I can use instead of rdist, as this problem
is quite urgent?
Jeremy
--
Jeremy Sanders <[EMAIL PROTECTED]> http://www-xray.ast.cam.ac.uk/~jss/
Pembroke College, Cambridge. UK Institute of Astronomy, Cambridge. UK