On 4/12/05, Lester Hightower wrote: > The actual replication happens in user-land with rsync as the transport. > I think rsync will have to be tweaked a little to make this work, but > given all the features already in rsync I don't think this will be a big > deal. I envision an rsync running on Host A like: > > # rsync --constant --from0 --files-from-fifo=/vol/0/chglog.fifo ... > > that will be communicating with an "rsync --constant ..." on the other > end. The --constant flag is my way of stating that both rsyncs should > become daemons and plan to "constantly" exchange syncing information until > killed -- that is, this is a "constant" rsync, not just one run.
Lester: Something like this is very high on my list of products I wish I had. I frequently use rsync to replicate data on a near real-time basis. My biggest pain point here is replicating filesystems with many (millions) of small files. The time rsync spends traversing these directories is immense. There have been discussions in the past of making an rsync that would replicate the contents of a raw device directly, saving the time spent checking each small file: http://lists.samba.org/archive/rsync/2002-August/003545.html http://lists.samba.org/archive/rsync/2003-October/007466.html It seems that the consensus from the list at those times is that rsync is not the best utility for this since it's designed to transfer many files rather than just one really big "file" (the contents of the device.) Despite the fact that the above discussions are almost 18 months ago, I have seen no sign of the rsync-a-device utility. If it exists, this might be the solution to what you propose-- and it would work on more than Linux. To achieve your goal with this proposed utility you would simply do something like this: + for each device ++ make a snapshot if your LVM supports it ++ transfer the diffs to the remote device + go back and do it all again If the appropriate permissions were in place this could be done entirely in user-mode, which is a great advantage for portability. As you touched on in your original message, knowing what's changed since the last run would be very helpful in reducing the amount of data that needs to be read on the source side. In my experience, sequential reads like this, even on large devices, don't take a huge amount of time compared with accessing large numbers of files. If there were only a few files on a mostly-empty volume the performance difference would be more substantial. ;-) Another thought to eliminate the kernel dependency is to combine the inode-walk done by the "dump" utility with the rsync algorithm to reduce the file data transferred. The inode walk would be filesystem-specific, but could be done in user space using existing interfaces. -- Steve -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html