My own tests on this subject (as I face the same situation) were done with 600K files 
in the same dir structure, having 6 subdirs with 100K files each (I expect I'll need 
to handle more than 5M with our increasing production rates).

I found that a simple 'find . -type f| wc -l' would take up to 2 hours to complete 
under AIX/JFS. This could explain some rsync problems.

The bad point for rsync is the HUGE amount of memory it requires in such cases.

However, I'll have no other choice than rsync to handle my nightly backup :-(
Will try to set up several parallel processes with a filter on file names and see if 
it helps ...

Sylvain.

> > -----Original Message-----
> > From: Matt Simonsen [mailto:[EMAIL PROTECTED]]
> > Sent: viernes, 04 de mayo de 2001 23:26
> > To: [EMAIL PROTECTED]
> > Subject: How to copy 1,000,000,000 files efficiently
> > 
> > 
> > Hello all-
> > 
> > We are in the process of developing a system which will need 
> > to daily copy
> > one million small (10k) files from one directory on one server to ten
> > others. Many of the files will not change.
> > 
> > Until now we have been using the "rsync -e ssh" approach, but 
> > as we have
> > started to add files (we are at 75,000) the time to generate 
> > a file list +
> > copy the files is far too slow. Is there a way to efficiently 
> > distribute the
> > data to all 10 servers without building the file list ten 
> > times? Any tips
> > that wise gurus could share would be very appreciated. Also, 
> > is there a
> > performance benefit from running Rsync as a daemon? Finally, 
> > is there any
> > other tool we could utilize with our without Rsync to help 
> > this process go
> > faster. As it stands now we believe that with the files in multiple
> > directories the process goes faster based on our initial tests.
> 
> I recently tried transferring about 2 million files. It took about 2-3 hours
> to generate the filelist and allocated roughly about 1 GB of RAM.
> All files were in the same directory which did hurt performance alot, at
> least for the transfer.... I don't know how much impact it had on building
> the filelist.
> Mostly this performance hit was because most filesystems get slow in access
> when you have alot of files in the same directory. My transfer was on a
> Tru64 box running AdvFS. I've tried similar transfers with much less files
> on Linux running ext2.... horror story, and with ReiserFS.... a little
> better.
> ...so it's fairly important to keep the number of files in every directory
> limited.
> 
> If possible, the best approach would probably be to make the thing
> generating the files take care of instantly copying the files to the 10
> destinations (replicate on create).
> 
> Another good approach is to let the thing generating the files create them
> in a temporary directory structure and then let something like rsync
> replicate (and delete on success). This would keep the source structure
> fairly small all the time.
> 
> If you can't avoid a situation where you have a truckload of files, running
> several rsyncs in parallel each taking care of a dedicated part of the
> directory structure will speed up things since each rsync has less files to
> take care of and hence will start the transfer sooner than a single rsync
> scanning everything. Secondly, running several in parallel will maximize use
> of cpu, disk, memory and network bandwidth. You might like that while
> someone else won't (other people using the network & computers).
> Ofcourse this only works if you to some degree can predict and distribute
> where in your directory structure there will be new files.
> 
> --  
> Un saludo / Venlig hilsen / Regards
> 
> Jesper Frank Nemholt
> Unix System Manager
> Compaq Computer Corporation
> 
> Phone : +34 699 419 171
> E-Mail: [EMAIL PROTECTED]
> 

Reply via email to