Hi Leen, Thanks for your reply.
On Thu, 2009-06-11 at 08:24 +0200, Leen Besselink wrote: > Leen Besselink wrote: > > Daniel.Li wrote: > >> Dear List, > >> > >> I'm trying to take a closer look at rsync code, and found when we run > >> daemon, it will take a lot of CPU (400Mhz). So I'm interested in Which > >> part of rsync code on ver 3.0.5 consuming CPU a lot? > >> > >> Can anyone here help to lighten me up? So I can try to improve the > >> performance or low the CPU usage. > >> > >> > >> I suspect that there are a few factors, which might related with CPU > >> usage: rolling checksum/Disk IO(a slide window has been implemented), > >> read or write? > >> > >> > >> Hope I can find some info here! Thanks in advance! > >> > >> > > > > Hi Daniel, > > > > Not sure how much you know about how rsync works, but maybe you first want > > to know how the algoritm works ? I'm fairly sure it's a large part of the > > CPU-usage: > > > > http://www.samba.org/rsync/tech_report/ > > > > But I personally enjoyed the talk talk Andrew Tridgell did at OLS in 2000 > > more, here is a transcript: > > > > http://olstrans.sourceforge.net/release/OLS2000-rsync/OLS2000-rsync.html > > > > Here are the slides of the talk: > > > > ftp://ftp.samba.org/pub/tridge/talks/rsync_ols.tgz > > > > I wouldn't be surprised if you were able to find the mp3 online somehere > > with the filename: > > > > 2000-07-21_15-02-49_C_64.mp3 I'm glad to see the above info. And I'll take a look a little bit later. Really appreciated. > > > > > > I was checking the talk and did find this bit: > > "in fact the bottleneck, when people use the -z option, 90% of the CPU is > in gzip, you know, the zlib library." > > So if you enabled compression, then you probably know where your CPU-time > went. Yes, indeed. Well, here, I think there are two questions: a) CPU usage of rsync GPL code: As you said, -z option is a factor. And I disable the "z" option, but it still use a lot of CPU, around 87% on my 400MHz arm-CPU. So I think it has something to do with algorithm (and hardware). I hope I can have some clue to lower the CPU usage, you know, maybe there is way to optimize the code. Well, I don't know the code very well. I think 3.+ version has been improved a lot from 2.6.9. But I'm wondering if we could optimize it further? b) diff-code contributed 10% or more CPU usage. I just finished diff-module based on rsync GPL code. It can save/restore diff data. But it DOES take a lot of CPU, arise from 87% to 97.7%. I think these 10% (or more) is contributed by my diff code. In theory, it should NOT need any extra CPU. People maintaining the rsync code has more experiences in the field. And I think they should have met this before. Currently, I don't know what factor will be the root cause for this contribution. I suspect that "slide window for reading data" might be the root cause." My current procedure in checksum caculation is: allocate buffer --> read data --> checksum --> release buffer. Rsync code: create slide window (first time or slide windows too small) --> feed back data (check if necessary to read data) --> checksum I have to test current code and then verify if it's the root cause. A. I also hope there is advice/suggestion on factors of CPU usage. Any advice/suggestion is appreciated. > -- Daniel Li -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html