[EMAIL PROTECTED] [[EMAIL PROTECTED]] writes:

> Actually, the lack of -W isn't helping me at all.  The reason is that
> even for the stuff I do over the network, 99% of it is compressed with
> gzip or bzip2.  If the files change, the originals were changed and a
> new compression is made, and usually most of the file is different.

Just to clarify, when you say "over the network" you mean in true
client/server rsync (or across an rsh/ssh stream) and not just using
one rsync with references using network mount points, right?  In the
latter case, not having -W is hurting you, never helping.

But yes, any format (e.g., encryption, compression) that effectively
distributes changes randomly over a file is going to be a killer for
rsync.

For the case of gzip'd files when a client and server rsync are in
use, you may want to look back through the archives of this list -
there was a reference to a patch for the gzip sources that created
rsync-friendly gzip's.  Not as great as the non-gzip'd version, but
far better than normal gzip.

Ah yes - here was the URL:

http://antarctica.penguincomputing.com/~netfilter/diary/gzip.rsync.patch2

At the time when I tried it (1/2001), here were some test results:

    For comparison, here's a database file (delta between one day and the
    next), both uncompressed and gzip'd (normal and -9).  For the
    uncompressed I also transferred with a fixed 1K blocksize since I know
    that's the page size for the database - the others are default
    computations (I tried the 1K with the gzip'd version but it was
    worse, as expected).

                    Normal     Normal+1K    gzip       gzip-9               
    Size            54206464   54206464     21867539   21845091
    Wrote            2902182    1011490      3169864    3214740
    Read               60176     317648        60350      60290
    Total            2962358    1329138      3230214    3275030

    Speedup            18.30      40.78        6.77       6.67
    Compression         1.00       1.00        2.479      2.481
    Normalized         18.30      40.78       16.78      16.54

And in terms of size:
   
    As Rusty's page comments, they are slightly larger, but not
    tremendously so.  In my one case:

            Normal gzip:            21627629
            gzip --rsyncable:       21867539
            gzip -9 --rsyncable:    21845091

    So about a 1-1.1% hit in compressed size.


Personally, here we end up just leaving the major stuff we transfer
uncompressed - as we're using slow analog lines, the cost recovery was
easily worth the cost in disk space, particularly in cases like our
databases where knowledge of the page size and method of change goes a
long way.

> It definitely helped for transferring ISO images where the whole image
> would be changed if some files changed.  I set the chunk size to 2048
> for that.  Why it defaults to 700 seems odd to me.
    
Not sure - perhaps some early empirical work.  When I'm moving files
that I know something about I definitely control the block size
myself, so for example, when moving databases with a 1K page size, I
always use a multiple of that (since I know a priori that's how the
database "dirties" the file), and then I scale that up a bit based on
database size, to get a reasonable tradeoff between block overhead and
extra transfer upon a change detection.

-- David

/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: [EMAIL PROTECTED]  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/

Reply via email to