Rsync - expensive startup question

Cedric Puddy Sun, 09 Nov 2003 15:14:26 -0800

Hi there,

I'm using rsync with some large trees of files (on one
disk, we have 30M files, for example, and a we might
be copying say, 500k files in one tree.  The file trees
are reasonably balenced -- no single directory has thousands
of files in it, for example.  Our file system, at the moment,
is ext3.  We are very comfortable with it, and are hesitant
to switch away from it, though JFS or Reiserfs could be
persuasive if people's experience strongly suggests that
they would help.  My guess is that because the tree is
reasonably balenced, changing filesystems isn't going
to have a major effect on how big a bottleneck the filesystem
may be.)


ANYWAY, the point is, as you've guessed, that I hate having
to wait 20 or 30 minutes in order to have a transfer start
(even when I'm copying to a location that doesn't even
have anything there yet, thus, no possibilities of deltas
to figure out).

I've never really asked about this because my assumption has
always been that it takes that long, becuase it simply takes
that long to scan the disks, populate rsync's data structures,
and get the show on the road, and that if I want it quicker,
then I can darn well get faster disks, etc.

(a) is that assumption correct?  Or am I missing anything?

(b) for those of you how understand rsync internals better
        than I (eg: anyone at all who's done anything with the
        code :P)  Is there any possibility of rsync-in-daemon
        mode being able to leverage the File Alteration Monitor
        (FAM) efforts in order to cheaply maintain a more-or-less
        up to the moment map of the trees it is exporting?
        (I have reservations about this, because I seem to recall
        understanding that FAM was *not* designed to watch
        *vast* huge portions of huge filesystems -- more that
        it was designed for monitoring specific resources.)

        For that matter, is this not the sort of thing that
        ReiserFS, with it's evolution towards a pluggable
        architecture, might be perfect for?

(c) I assume that it would be folly (eg: something that complicates
        the problem space substantially) to try and write something
        that simply started copying, and built the map as it
        went along, or in the background (though I could see
        this as being very interesting for situations were ones
        network was *much* slower than ones disks).

One of the reasons I ask is that I've often come across rsync
being used as a sort of lazy filesystem mirroring tool, the
point being to make a sync with a remote filesystem every,
say, 10 minutes.  Which is fine, until the file tree grows
to large to parse in 10 minutes, in which case you have to
(a) reduce the transfer frequency, and (b) resign yourself
to have your i/o subsystem running flat out *all the time*.

Also, with the "monilithic" scan, the filesystem can easily
change between the scan being done, and the actual directory/file
in question being copied.  Might it not be better all round
to walk the tree progressively, making a sync plan for each
"leaf node" of the tree as one reaches it?

Anyway, I'd be interested what people think -- this is an
awesome tool, and if there's any chance that addressing
some of these things is technically possible, I'd like to
know.  (Never know, I might be able to help get the work
done, or at least fund someone)

All the best,

-Cedric


-- 
-
|  CCj/ClearLine - Unix/NT Administration and TCP/IP Network Services
|  118 Louisa Street, Kitchener, Ontario, N2H 5M3, 519-741-2157
\____________________________________________________________________
   Cedric Puddy, IS Director            [EMAIL PROTECTED]
     PGP Key Available at:              http://www.thinkers.org/cedric

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Rsync - expensive startup question

Reply via email to