Hello - Running rsync v3.0.5 on a mixture of CentOS 4.6 and 5.1 systems, using hpnssh as the transport mechanism.
I am using rsync to replicate roughly a TB worth of compressed log data per day from a bunch of systems for processing. Every hour the systems generate log files, compress them and then rsync pushes them out to a centralized set of redundant hosts with their storage connected to a clustered NFS file store. Every hour as well back end systems pull these logs and process them. In total there are roughly 210,000 files to be downloaded per day at the moment. It seems that once or twice a week I get a corrupted file(at least for a while), which I would not expect because rsync doesn't rename the file until the file copy is completed successfully as far as I can tell. Eventually rsync does recover on it's own but not before the back end process tries to grab the corrupted file and fails, usually the team of people that watch this process copy the file by hand before rsync has a chance to catch up. But the main reason for my post is that rsync does not appear to be obeying the file order I am specifying in my command. In the event of a failed transfer I want rsync to transfer the oldest files first so that the back end systems don't get backed up. But even though the file list I provide rsync has files in a specific order, it seems to re-arrange them anyways into alphabetical order. My rsync script kills any running copies of rsync before running again to make sure it's the only one going and to make sure that rsync doesn't get stuck for some reason. The scripts run twice an hour on each system(half of this data is being pushed from the east coast of the U.S. to the west coast). An example from today is that at about 12:15PM rsync was running and copying a specific file, and was aborted by the newest running rsync script, which then proceeded to copy files that were *just* generated in the past 5 minutes instead of the files that were generated in the previous hour. This script ran until about 12:45 when it was killed again and a new copy of the script started, and it too was copying files from the latest hour and not the older files first. Then it was killed at about 1:15 and a new job kicked of and it was actually able to catch up everything by 1:30. It would of caught up sooner we were having problems with an ISP that was causing throughput to be much lower than normal. I build the file list with a simple ls -ltr | awk '{print $9}'. I keep quite a bit of logging/debug information so I can confirm that the file lists rsync was using were ordered correctly and that the files that rsync was transferring was in alphabetical order, not in the order listed in the file listing. The problem is really two fold- there seems to be some sort of issue/bug? with rsync where under certain circumstances it will rename one of it's temporary "dot" files to the real file name even though it hasn't been successfully copied, but more importantly I'd like to transfer files that are oldest first, either via file includes list or some other means. Given the large number of files I would prefer not to execute a seperate rsync process for every file..! I've tested several times copying a file and aborting it mid copy and rsync has never renamed it, the temporary dot file is left behind(which I expect). I've been using rsync for years but this is by far the biggest implementation I've done with it. Here is a sample of the command I am using(added some line breaks for readability?): ssh 10.254.213.203 "mkdir -p /path/to/dest/" && rsync -ae "/usr/bin/hpnssh -v -o TcpRcvBufPoll=yes -o NoneEnabled=yes -o NoneSwitch=yes" --timeout=600 --partial --log-format="[%p] %t %o %f (%l/%b)" --files-from=/home/logrsync/conf/rsync_log_file_list.20090207_124642 /path/to/source 10.254.213.203:/path/to/dest 1>>/home/logrsync/logs/server_name_rsync_log_transfer_20090207_124642.log 2>&1 Side note - anyone have numbers for running rsync over ssh over a WAN? Even with several hundred megabits of bandwidth available on each side it seems most often each file copy caps out at about 700kB/s with hpnssh(lower with normal ssh), latency is about 80-90ms between the sites. There are about 45 servers so we still get good performance as an aggregate but it'd be nice to get better performance on a per-server level as well if possible. I think hpnssh is the right approach with it's auto tuning and stuff but my expectations were for higher throughput than I ended up getting. And it seems that even with 5% packet loss that absolutely kills performance, can go down 75-90%. Which is the problem one of my ISPs seems to be having. thanks nate -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html