Re: rsync digest, Vol 1 #256 - 3 msgs

Robert Caine Tue, 08 May 2001 03:20:06 -0700

Matt,

At a previous employer I looked at using rsync to migrate a large number of files
(1,000,000+) from one server to another and after testing, found that it was not
the solution for this application. Although you have not provided your service
level expectations, I'll assume that you are not looking for real time
replication. We used a volume synchronization product that maintained real time
synchronization (a bonus since it was not our requirement) instead of rsync. The
time to build the file lists and the memory requirements did not meet our
requirements.

If you are fixed on rsync, you may reach your goal by breaking the data into
multiple file systems/directories to keep the inode tables smaller and then use
multiple rsync scripts to move each file system/directory independently. However,
it could be that your architecture is not appropriate for this application. I
hope this helps.

Bob

[EMAIL PROTECTED] wrote:

> Send rsync mailing list submissions to
>         [EMAIL PROTECTED]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.samba.org/mailman/listinfo/rsync
> or, via email, send a message with subject or body 'help' to
>         [EMAIL PROTECTED]
>
> You can reach the person managing the list at
>         [EMAIL PROTECTED]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of rsync digest..."
>
> Today's Topics:
>
>    1. How to copy 1,000,000,000 files efficiently (Matt Simonsen)
>    2. Re: How to copy 1,000,000,000 files efficiently (Fabien Penso)
>    3. RE: How to copy 1,000,000,000 files efficiently (Nemholt, Jesper Frank)
>
> --__--__--
>
> Message: 1
> From: "Matt Simonsen" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Subject: How to copy 1,000,000,000 files efficiently
> Date: Fri, 4 May 2001 14:25:50 -0700
>
> Hello all-
>
> We are in the process of developing a system which will need to daily copy
> one million small (10k) files from one directory on one server to ten
> others. Many of the files will not change.
>
> Until now we have been using the "rsync -e ssh" approach, but as we have
> started to add files (we are at 75,000) the time to generate a file list +
> copy the files is far too slow. Is there a way to efficiently distribute the
> data to all 10 servers without building the file list ten times? Any tips
> that wise gurus could share would be very appreciated. Also, is there a
> performance benefit from running Rsync as a daemon? Finally, is there any
> other tool we could utilize with our without Rsync to help this process go
> faster. As it stands now we believe that with the files in multiple
> directories the process goes faster based on our initial tests.
>
> Thanks
> Matt
>
> --__--__--
>
> Message: 2
> To: "Matt Simonsen" <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>
> Subject: Re: How to copy 1,000,000,000 files efficiently
> From: Fabien Penso <[EMAIL PROTECTED]>
> Organization: LinuxFr: http://www.linuxfr.org/
> Date: 04 May 2001 23:38:12 +0200
>
>  > Hello all-
>
> [...]
>
>  > Thanks
>  > Matt
>
> Looking at the subject I was going to run my Emacs/Gnus junk complain
> function :-)) Sorry off topic, but still fun.
>
> Cheers to the rsync devel team, I use that piece of software everyday!
>
> --__--__--
>
> Message: 3
> From: "Nemholt, Jesper Frank" <[EMAIL PROTECTED]>
> To: 'Matt Simonsen' <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
> Subject: RE: How to copy 1,000,000,000 files efficiently
> Date: Sat, 5 May 2001 01:35:23 +0100
>
> > -----Original Message-----
> > From: Matt Simonsen [mailto:[EMAIL PROTECTED]]
> > Sent: viernes, 04 de mayo de 2001 23:26
> > To: [EMAIL PROTECTED]
> > Subject: How to copy 1,000,000,000 files efficiently
> >
> >
> > Hello all-
> >
> > We are in the process of developing a system which will need
> > to daily copy
> > one million small (10k) files from one directory on one server to ten
> > others. Many of the files will not change.
> >
> > Until now we have been using the "rsync -e ssh" approach, but
> > as we have
> > started to add files (we are at 75,000) the time to generate
> > a file list +
> > copy the files is far too slow. Is there a way to efficiently
> > distribute the
> > data to all 10 servers without building the file list ten
> > times? Any tips
> > that wise gurus could share would be very appreciated. Also,
> > is there a
> > performance benefit from running Rsync as a daemon? Finally,
> > is there any
> > other tool we could utilize with our without Rsync to help
> > this process go
> > faster. As it stands now we believe that with the files in multiple
> > directories the process goes faster based on our initial tests.
>
> I recently tried transferring about 2 million files. It took about 2-3 hours
> to generate the filelist and allocated roughly about 1 GB of RAM.
> All files were in the same directory which did hurt performance alot, at
> least for the transfer.... I don't know how much impact it had on building
> the filelist.
> Mostly this performance hit was because most filesystems get slow in access
> when you have alot of files in the same directory. My transfer was on a
> Tru64 box running AdvFS. I've tried similar transfers with much less files
> on Linux running ext2.... horror story, and with ReiserFS.... a little
> better.
> ...so it's fairly important to keep the number of files in every directory
> limited.
>
> If possible, the best approach would probably be to make the thing
> generating the files take care of instantly copying the files to the 10
> destinations (replicate on create).
>
> Another good approach is to let the thing generating the files create them
> in a temporary directory structure and then let something like rsync
> replicate (and delete on success). This would keep the source structure
> fairly small all the time.
>
> If you can't avoid a situation where you have a truckload of files, running
> several rsyncs in parallel each taking care of a dedicated part of the
> directory structure will speed up things since each rsync has less files to
> take care of and hence will start the transfer sooner than a single rsync
> scanning everything. Secondly, running several in parallel will maximize use
> of cpu, disk, memory and network bandwidth. You might like that while
> someone else won't (other people using the network & computers).
> Ofcourse this only works if you to some degree can predict and distribute
> where in your directory structure there will be new files.
>
> --
> Un saludo / Venlig hilsen / Regards
>
> Jesper Frank Nemholt
> Unix System Manager
> Compaq Computer Corporation
>
> Phone : +34 699 419 171
> E-Mail: [EMAIL PROTECTED]
>
> --__--__--
>
> _______________________________________________
> rsync mailing list
> [EMAIL PROTECTED]
> http://lists.samba.org/mailman/listinfo/rsync
>
> End of rsync Digest

begin:vcard 
n:Caine;Robert
tel;cell:703-732-2464
x-mozilla-html:FALSE
adr:;;;;;;
version:2.1
email;internet:[EMAIL PROTECTED]
fn:Robert Caine
end:vcard

Re: rsync digest, Vol 1 #256 - 3 msgs

Reply via email to