Re: Syncing large amounts of data

jw schultz Wed, 12 Feb 2003 00:45:01 -0800

On Wed, Feb 12, 2003 at 01:13:45AM -0600, Adam Herbert wrote:
> I need some suggestions. Here's my setup:
> 
>       800GB of Data
>       14,000,000+ Files
>       No changes just additions
>       Files range in size from 30k - 190k
> 
> The files are laid out in a tree fashion like:
> 
> BASE
>    \-Directory ( Numerical Directory name from 0 - 1023 )
>      \-Directory ( Numerical Directory name from 0 - 1023 )
>        \- Files ( Up to 1024 files each directory )
> 
> 
> This allows for a maximum of about a billion files. I need to limit the
> amount of memory usage and processor / io time it takes to build the
> list of files to transmit. Is there a better solution that rsync? Are
> the patches that would help rsync in my particular situation?


Rsync's real advantage is when files change.  In this case
that is moot.

Certainly using rsync on the whole thing at once will
probably use more memory than you want.  You could loop
through the second level directories with rsync.

My inclination here would be to roll your own.
Something as simple as a 

        touch $newstamp
        cd $BASE
        find . -newer $laststamp | cpio -oH crc|ssh $dest 'cd $BASE; cpio -idum'
        mv $newstamp $laststamp

may be sufficient.  Building the filelist by using comm -23
on the sorted outputs of "find . -type f -print" on source
and dest may be more reliable.

For that matter it might be worthwhile building the
infrastructure to replicate the files at creation time.  The
structure you describe indicates to me that the files are
created by an automated process, build the replication into
that.

-- 
________________________________________________________________
        J.W. Schultz            Pegasystems Technologies
        email address:          [EMAIL PROTECTED]

                Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html

Re: Syncing large amounts of data

Reply via email to