Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]

Alberto Accomazzi Mon, 17 May 2004 07:15:56 -0700


Chris,

to put things in the right prespective, you should read (if you haven't done so already) the original paper describing the design behind batch mode. The design and implementation of this functionality goes back to a project called the Internet2 Distributed Storage Infrastructure (I2-DSI). As part of that project, the authors created a modified version of rsync (called rsync+) which had the capability of creating these batch sets for mirroring. Here are a couple of URLs describing the ideas and motivation behind it: http://www.ils.unc.edu/i2dsi/unc_rsync+.html http://www.ils.unc.edu/ils/research/reports/TR-1999-01.pdf

Chris Shoemaker wrote:

Yes, I think you're right about the original design. And I guess we'd want to preserve that capability. Or would we? I'm having a little trouble seeing why this was the intended use. I figure, there are three cases:
   A) If you have access to both source and dest, it doesn't really matter too
much who writes the batch -- this is like the local copy case.
   B) If you have access to the dest but not the source, then you need the
client to write the batch -- and it's not far-fetched that you might have
other copies of dest to update.
   C) However, having access to source but not dest is the only case that
_requires_ the sender to write the batch -- now what's the chance that you'll
have another identical dest to apply the batch to?  And if you did, why
wouldn't you generate the batch on that dest as in case A, above?
So, it seems to me that it's much more useful to have the receiver/client write the batch than sender/client, or receiver/server, or sender/server. But, maybe I'm just not appreciating what the potential uses of batch-mode are.

>
>  Survey: so who uses batch-mode and what for?

I haven't used the feature but back when I read the docs on rsync+ I thought it was a clever way to do multicasting on the cheap. I think the only scenario where batch mode makes sense is when you need to distribute updates from a particular archive to a (large) number of mirror sites and you have tight control on the state of both client and server (so that you know exactly what needs to be updated on the mirror sites). This ensures that you can create a set of batch files that contain *all* the changes necessary for updating each mirror site.

So basically I would use batch mode if I had a situation in which:

1) all mirror sites have the same set of files 2) rsync is invoked from each mirror site in exactly the same way (i.e. same command-line options) to pull data from a master server

then instead of having N sites invoke rsync against the same archive, I would invoke it once, make it write out a set of batch files, then transfer the batch files to each client and run rsync locally using the batch set. The advantage of this is that the server only performs its computations once. An example of this usage would be using rsync to upgrade a linux distribution, say going from FC 1 to FC 2. All files from each distribution are frozen, so you should be able to create a single batch which incorporates all the changes and then apply that on each site carrying the distro.

The question of whether the batch files should be on the client or server side is not easy to answer and in the end depends on exactly what you're trying to do. In general, I would say that since the contents of the batch mode depend on the status of both client and server, there is not a "natural" location for it.


-- Alberto

********************************************************************
Alberto Accomazzi                      aaccomazzi(at)cfa harvard edu
NASA Astrophysics Data System                        ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics      www.cfa.harvard.edu
60 Garden St, MS 31, Cambridge, MA 02138, USA
********************************************************************

--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]

Reply via email to