On Thu, 31 Jan 2002, Dave Dykstra wrote: > It's up to Martin to decide, but I'm sorry to tell you that I'm opposed to > a --move-files option. I think that if somebody wants to do that they > should do it with an external program after rsync returns a clean exit > code. It seems to me that it goes against the purpose of rsync because > after the files are removed from the sending side there's nothing left to > sync later.
I use rsync instead of scp to copy all my files from system to system, even when I'm not going to synchronize anything. The reason is that it does so many things the right way that scp doesn't support (e.g. scp opens a new ssh connection for every file, it has no option to write data to a temp file outside of the destination dir and move it in place when complete, it has no include/exclude options, a non-recursive copy doesn't handle directories as nicely as rsync, etc.). So, I understand where you're coming from, but I look at rsync as a general file-copying tool (that is also very efficient at updating files) rather than just as a tool that keeps files in sync. I know that the line has to be drawn somewhere, though, when deciding how much is too much. > I see that Tridge liked the idea in general but had some problems with your > implementation: > > http://lists.samba.org/pipermail/rsync/2001-May/004282.html > > Have you addressed his concern? That was my earliest patch from back before I understood the data flow between all the rsync modules. It was my work on the move-files option that prompted me to do all the no-hang work, including the patch that is required by the move-files option. You'll find later discussion on the list where Tridge (I believe) also objected to a buffer that could grow dynamically. I then changed my implementation to use a fixed-size buffer, which is in the current patch. Here's an overview of what the no-hang patch does, with some move-file comments as well. When the receiver process is created, it forks off a generator process on the same machine with two pipes between them (both flowing from the receiver to the generator). The first is an error channel (that is also used for verbose output) and the second is a redo channel that sends the numbers of the files that need to be reprocessed. In the generator, the first channel is constantly checked for content, even when we're reading the redo channel or writing out data to the sender (this is necessary to keep the receiver from blocking trying to send the generator data while the generator is trying to do something else). However, the "redo" pipe is not currently kept clear. It is assumed that the number of redo items will fit within a pipe's data buffer. This assumption is usually right, but for really large numbers of files it might fill up and cause rsync to hang. (Also, my move-files patch uses this channel, so it is imperative that the redo channel be kept clear for --move-files to work). My no-hang patch adds an array of flag ints, one for each item in the list of files that are being sent. The read process in io.c is then extended to allow the redo pipe to be monitored, flagging all redo items that show up into the flag array. This keeps the channel clear, and provides a way to regenerate the list of redo items for the generator. (The move-files patch extends this to flag which items are complete and can be deleted.) The only complicating factor is what happens when we actually read the redo channel's fd in a blocking manner at the end of the run (when we're waiting for the -1 EOF flag). While doing this, we need to be reading the error channel and also continuing to flush the write channel to the sender. If we break away from the redo channel work to read or write something else, that read function might actually read data from the redo channel as a side-effect of its primary work (and we can't disable this, since we need to keep the redo channel from blocking while doing other read/write work). My solution makes the read process aware of when it is reading the redo channel, and has it return a -3 when some side-effect work has already put data into the flag array (instead of trying to read even more data that may not be there). Since the function calling the redo-read knows to look in the array, this results in all the data being processed properly and in the correct order (note that the function also keeps track of how many EOF -1 items it has seen, which is vital to it working properly). Once the redo channel has been made non-blocking, it is a very simple matter to add move-files support. The receiver sends the numbers of all the files that have been successfully written over to the generator process, which forwards them back to the sender via the normal (combo) data channel, and the sender reads these safe-to-delete messages and unlinks the corresponding file for each one it gets. ..wayne..