I agree with Martin Pool's diagnosis about --times, or --archive
which includes it along with many other options. If some of the
systems can't represent file mod times to the same resolution, the
--modify-window option may be helpful as well.
If you don't care to update timestamps, you can use --size-only, in
which case files will be assumed to be the same if they're the same
size, even if they have different mod times. Or, if local bandwidth
to disk is sufficiently high, you can achieve the same benefit as
--times with good timestamps by using --checksums --- but then each
end will need to read every file in the entire heirarchy to compute
its checksum. This is normally not a win.
But even without the timestamps in sync --- with --ignore-times, or
on rsyncs without --times, or on the first rsync after using another
distribution tool that doesn't synchronize timestamps --- rsync
should still give you the benefit of "the rsync algorithm", which
performs distributed file comparisons very efficiently indeed.
When the rsync algorithm must be run for a pair of identical files,
the savings are still pretty dramatic, with data transferred being a
checksum for every block of the data file. As long as bandwidth to
disk on each end is dramatically greater than the network bandwidth
between machines, rsync can give a big performance win in every
measure. But for directory trees that change infrequently, it's
important to get the times right, or to use --size-only, to allow
rsync to avoid having to read through most of the files.
One other note to remember: with really gigantic trees, containing
zillions and zillions of files on each end, you may need to break up
the rsync into multiple transfers of subtrees, because rsync builds
a data structure with some memory consumed for each file in the
sync, and sufficiently large xfers can cause it to exhaust memory.
-Bennett
PGP signature