Dave,

Sure, I'll go ahead and benchmark the difference. I should be able to get
something out by this weekend. I do think this optimization should be made,
I can think of a bunch of situations where the design of the system means
the programmer has extra information that could improve efficiency (and be
simpler then recursing and including required parent directories) if they
are allowed to efficiently drive rsync. That's always a good thing.

In my case, each directory has around 700 files, and there are 700
directories. The files themselves are small (500 bytes or 1.5kb depending on
the application). Files that change will be randomly distributed across the
directories. What this means is for changes in let's say 20 files which are
likely to be in 20 different directories, rsync will go through and try and
match against 14,000 files. Since the size of these files is small, and
their transfer from machine to machine on the same ethernet segment should
be extremely fast, the proportion of time spent in searches and scans could
be larger then desirable. Then what happens if we decide to put all 500,000
files in one directory and rsync changes in 10 of them every second? And
then run 10 rsyncs in parallel to sync with 10 servers in our server farm?

August

I should add that we are looking at more complex solutions such as coda
filesystems which are probably more appropriate for what we want to
accomplish. But we like sweet and simple, and rsync is that.



> -----Original Message-----
> From: Dave Dykstra [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, January 04, 2001 11:23 AM
> To: [EMAIL PROTECTED]
> Cc: August Zajonc
> Subject: Re: Transfering File List
>
>
> On Wed, Jan 03, 2001 at 05:25:42PM -0500, Alberto Accomazzi wrote:
> > In message <[EMAIL PROTECTED]>, Bennett Todd writes:
> >
> > > Even nicer, in my opinion, would be a mode where rsync could be told
> > > to take a src dir and a dst dir as cmdline args, then simply reads
> > > paths from stdin, and as each path is read, sync from that src file
> > > under the src dir to the corresponding dst file under the dst dir;
> > > repeat until eof on stdin. That'd make it easy for a process that
> > > periodically modifies one or another file in a potentially large
> > > tree, to simply send notifications to a persistent rsyncer that
> > > takes care of efficiently replicating those changes over to the
> > > other side.
> >
> >
> > I second that, although I haven't had the real need to have such an
> > interface to rsync so far.  After reading the slew of messages on this
> > mailing list from people confused about --include and  --exclude it's
> > clear to me that it would make sense to have the option to just give
> > rsync a list of files (or directories) to transfer.  If I were to do
> > this I would probably implement it as it's in gnu tar:
> >
> >    rsync --files-from=FILE
>
>
> I like that idea.  I would call it "--only-from".  It could be implemented
> almost exactly like my old optimization (which doesn't change the rsync
> protocol) and it wouldn't change the semantics of include and exclude.
> I expect Andrew wouldn't have any objection to that (tell me if I'm wrong,
> Andrew).
>
> The different semantics (of not having to include the parent directories
> in the list) is almost enough reason by itself to have this option, but
> I would like to also see some evidence that it makes a difference in
> performance before I develop the code.  August, would you be willing to
> measure the difference in elapsed & cpu time to copy your 800 or so files
> via an include-from/exclude '*' with an rsync 2.3.2 client with
> and without
> the optimization?  To turn off the optimization, all you need to do is add
> a wildcard to one of the paths (making sure that it doesn't pull in any
> additional files of course).  When the optimization is off, you will need
> to include all parent directories of course.  The server side doesn't need
> to be rsync 2.3.2, it can be any version.
>
> > where FILE could be "-" to mean STDIN.
>
> Why not.  And it would be easy to add that capability to
> --include-from and
> --exclude-from at the same time.
>
> - Dave
>


Reply via email to