On Sun, Apr 3, 2022 at 4:59 AM Wols Lists <antli...@youngman.org.uk> wrote: > > On 03/04/2022 02:15, Bill Kenworthy wrote: > > Rsync has a bwlimit argument which helps here. Note that rsync copies > > the whole file on what it considers local storage (which can be mounted > > network shares) ... this can cause a real slowdown. > > It won't help on the initial copy, but look at the - I think it is - > --in-place option. > > It won't help with the "read and compare", but it only writes what has > changed, so if a big file has changed slightly, it'll stop it re-copying > the whole file.
You might also try ionice - though I find that is hit and miss for effectiveness once you start adding layers like lvm/mdadm/etc as I don't know that the kernel actually sees all the downstream queues when it is throttling processes. I haven't used it on LVM in a while though. Replication performance (especially if you want to do a second pass with rsync) is the sort of thing that using pvmove/etc helps with - since it will ensure nothing gets moved. Snapshot-supporting filesystems like zfs/btrfs are also better if you want to sync things up because they can rapidly identify all the changes between two snapshots without having to actually read anything but metadata, assuming you manage things correctly and maintain a common baseline between them. Of course all of those options require that they be set up in advance. If you just have two generic filesystems and want to sync them, then rsync is your main option. Oh, one thing I would suggest is that if they're on different hosts you actually run rsyncd or do the sync over ssh so that rsync recognizes the situation and will run the client on the remote host, so that all the hashing/etc is run local to the drives. This greatly reduces your network traffic which is likely to be the bottleneck. All the same, if you want to actually use hashes to find differences and not just rely on size/mtime there is no getting around having to read all the data off the disk. -- Rich