On Feb 12, 2008 7:53 AM, Tim Brody <[EMAIL PROTECTED]> wrote: > Hi All, > > I have a 75GB collection of data, including a lot of duplicated files, > on a NTFS network drive. I want to backup that data across a DSL link to > a Linux host. Currently I use cwrsync on a Windows machine to act as > server to the Linux rsync client. > > I want to avoid transferring duplicated data, as the DSL link is a far > more significant factor than computation/disk IO. I can't work out > whether rsync (or any patch) will make it smart enough to spot duplicate > files, regardless of file location (like fdupes or similar). Because > this is coming off a network drive there's no way I can "hard link" (or > NTFS equivalent) duplicates on the source tree, so it needs to happen in > rsync. > > I've tried using the --detect-renamed patch on 3.0.0 in the following > (made up) set up: > > src/ > src/dup > src/dup/tardis.mp3 > src/tardis.mp3 > src/tardis2.mp3 > > ../rsync-3.0.0pre9/rsync -avi --detect-renamed --fuzzy --checksum src/ > dest/ > building file list ... done > .d..t...... ./ > >f+++++++++ tardis.mp3 > >f+++++++++ tardis2.mp3 > cd+++++++++ dup/ > >f+++++++++ dup/tardis.mp3 > > sent 167076 bytes > > Which is 3x the size of "tardis.mp3". > > If I remove tardis2.mp3: > >f+++++++++ tardis2.mp3 > > sent 536 bytes received 526 bytes 193.09 bytes/sec > > If I remove dup/tardis.mp3: > >f+++++++++ dup/tardis.mp3 > > sent 55801 bytes received 34 bytes 111670.00 bytes/sec > > I've found some threads about duplicate files/the bug related to the > detect-renamed above, but nothing specifically about doing a blanket > search for duplicates similar to fdupes. > > Any suggestions would be helpful. > > Thanks, > Tim.
Can you run fdupes or a find command to create a file list and then feed it to rsync via the include or exclude file list switches? Jon
-- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html