Hi,
I got stuck within some weird prob concerning my 2-node linux cluster and the synchronisation tool at hand (rsync-2.5.1pre1). I have to copy a structure of 70 directories where the data of these directories are hardlinked to the data of the 1st directory. Within this "orig data" directory, I have about 30.000 files, so the amount of files to sync is approx. 2.100.000. The overall size is about 9.2GB. The method to synchronize is to have a "rsync --daemon" running on the server in production and pull the data into the backup server via rsync::. I secured this mechanism via a separate 100mbit network link that is provided exclusively for the task. The systems are - linux-2.4.16 - glibc-2.2 - i686 (Coppermine with 900MHz) with 512MB RAM and 400MB Swap on the main server and 128MB Swap on the backup server (I know this is stupid but at the moment I can't help it) What happens? The synchronization starts and gobbles up approx. 300MB of RAM/Swap by calculating the file list at the server. At the client system, approx. 620MB (aka. nearly all) memory is allocated to compare the file list (the sync is set up with -auvH). The files are transfered - when running it the 1st time, all files are transfered of course - and the transfer stops at the client after an hour with rsync.c:sig_int() called. rsync error: received SIGUSR1 or SIGINT (code 20) at rsync.c(230) rsync error: received SIGUSR1 or SIGINT (code 20) at main.c(741) where I do not see *anything* that is interfering (not me either). OK I say, better luck next time. However, as no rsync process remains at the client (backup server) side, the "rsync --daemon" at the main server did reduce its memory usage over the file transfer but after the client broke off communications still has a child hanging around with 200MB of mem in use! So, when running rsync the next time, I will have 500MB of memory eaten up by both rsyncs on the main server (the new and the old) which is quite a lot. Unfortunately, the second (and third) tries to sync break after some time with similar messages as shown above and the hanging processes at the main server will be happy with 700-800MB mem at their hands. The result? The production server is dying a slow and painful out-of- mem-death when I don't do a "killall -9 rsync" after some time.... Any comments how to debug this? I just have the idea that maybe the kernel at client side is sending silently a signal to the rsync process due to excess memory usage? How to avoid that behaviour? (the client system was quite happy as it does nothing else than rsyncing...) Regards, - Birger