[EMAIL PROTECTED] [[EMAIL PROTECTED]] writes:
> Actually, the lack of -W isn't helping me at all. The reason is that
> even for the stuff I do over the network, 99% of it is compressed with
> gzip or bzip2. If the files change, the originals were changed and a
> new compression is made, and usually most of the file is different.
Just to clarify, when you say "over the network" you mean in true
client/server rsync (or across an rsh/ssh stream) and not just using
one rsync with references using network mount points, right? In the
latter case, not having -W is hurting you, never helping.
But yes, any format (e.g., encryption, compression) that effectively
distributes changes randomly over a file is going to be a killer for
rsync.
For the case of gzip'd files when a client and server rsync are in
use, you may want to look back through the archives of this list -
there was a reference to a patch for the gzip sources that created
rsync-friendly gzip's. Not as great as the non-gzip'd version, but
far better than normal gzip.
Ah yes - here was the URL:
http://antarctica.penguincomputing.com/~netfilter/diary/gzip.rsync.patch2
At the time when I tried it (1/2001), here were some test results:
For comparison, here's a database file (delta between one day and the
next), both uncompressed and gzip'd (normal and -9). For the
uncompressed I also transferred with a fixed 1K blocksize since I know
that's the page size for the database - the others are default
computations (I tried the 1K with the gzip'd version but it was
worse, as expected).
Normal Normal+1K gzip gzip-9
Size 54206464 54206464 21867539 21845091
Wrote 2902182 1011490 3169864 3214740
Read 60176 317648 60350 60290
Total 2962358 1329138 3230214 3275030
Speedup 18.30 40.78 6.77 6.67
Compression 1.00 1.00 2.479 2.481
Normalized 18.30 40.78 16.78 16.54
And in terms of size:
As Rusty's page comments, they are slightly larger, but not
tremendously so. In my one case:
Normal gzip: 21627629
gzip --rsyncable: 21867539
gzip -9 --rsyncable: 21845091
So about a 1-1.1% hit in compressed size.
Personally, here we end up just leaving the major stuff we transfer
uncompressed - as we're using slow analog lines, the cost recovery was
easily worth the cost in disk space, particularly in cases like our
databases where knowledge of the page size and method of change goes a
long way.
> It definitely helped for transferring ISO images where the whole image
> would be changed if some files changed. I set the chunk size to 2048
> for that. Why it defaults to 700 seems odd to me.
Not sure - perhaps some early empirical work. When I'm moving files
that I know something about I definitely control the block size
myself, so for example, when moving databases with a 1K page size, I
always use a multiple of that (since I know a priori that's how the
database "dirties" the file), and then I scale that up a bit based on
database size, to get a reasonable tradeoff between block overhead and
extra transfer upon a change detection.
-- David
/-----------------------------------------------------------------------\
\ David Bolen \ E-mail: [EMAIL PROTECTED] /
| FitLinxx, Inc. \ Phone: (203) 708-5192 |
/ 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \
\-----------------------------------------------------------------------/