On Thu, 11 Apr 2002, Martin Pool wrote: > I'd appreciate comments.
Hmm... As you may know I'm both the APT author, administrator of the top level debian mirrors and associated mirror network. So, > 3.2 rsync is too hard on servers > If it is, then I think we should fix the problems, rather than > invent a new system from scratch. I think the scalability problems > are accidents of the current codebase, rather than anything inherent > in the design. It's true I'm afraid. Currently on ftp.d.o: nobody 8835 25.7 0.3 22120 1740 ? RN Apr10 525:24 rsync --daemon nobody 22896 5.0 0.3 22828 1992 ? SN Apr11 21:20 rsync --daemon nobody 3907 7.3 0.5 22336 2820 ? RN Apr11 15:30 rsync --daemon nobody 10729 13.7 4.0 22308 20904 ? RN Apr11 13:10 rsync --daemon The load average is currently > 7 all due to rsync. I'm not sure what that one that has sucked up 500mins is actually doing, but I've come to accept that as 'normal'. I expect some client has asked it to recompute every checksum for the entire 30G of data and it's just burning away processor power <sigh>. We tend to allow only 10-15 simulataneous rsync connections because of this. Things are better now, in the past with 2.2 kernels and somewhat slower disks rsync would not just suck up CPU power but it would seriously hit the drives as well. I think the improvements in inode/dentry caching in 2.4, and our new archive structure are largely responsible for making that less noticable. IMHO as long as rsync continues to have a server heavy design it's ability to scale is going to be quite poor. Right now there are 91 people connected to ftp/http on ftp.d.o, if they were using rsync's I'm sure the poor server would be quite dead indeed. > 3.1 Compressed files cannot be differenced I recall seeing some work done to determine how much savings you could expect if you used xdeltas of the uncompressed data. This would be the best result you could expect from gzip --rsyncable. I recall the numbers were disapointing, it was << 50% on average or somesuch. It would be nice if someone could find that email or repeat the experiments. > 3.5 Goswin Brederlow's proposal to use the reverse rsync algorithm over > HTTP Range requests Several years ago I suggested this in a conversation with you on one of the rsync lists, someone else was able to pull a reference to the IBM patent database and claimed it was the particular patent that prohibits the server-friendly reverse implementation. > 3.7 rsync uses too much memory This only really seems to be true for tree-mirroring, the filelists can be very big indeed. Jason -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]