I found a way of copying files from one drive to another that is signifigantly faster than "cp -a"... (this is just the sort of geeky++ type stuff you guys like to read, I bet.)
See if you can follow along here and see what I did. The "cvs.gnome.org" directory contains a checkout of the "gnome" and "CVSROOT" modules only. [EMAIL PROTECTED]:~ # du -hs /usr/local/src/cvs.gnome.org 260M /usr/local/src/cvs.gnome.org First, time the "cp -a". [EMAIL PROTECTED]:~ # time cp -a /usr/local/src/cvs.gnome.org /mnt/tmp/src cp -a /usr/local/src/cvs.gnome.org /mnt/tmp/src 0.37s user 13.81s system 27% cpu 51.674 total Now let's try using "tar" commands. [EMAIL PROTECTED]:~ # time (cd /usr/local/src/ && tar pcf - cvs.gnome.org) | (cd /mnt/tmp/src/ && tar pxf -) ( cd /usr/local/src/ && tar pcf - cvs.gnome.org ) 0.77s user 8.02s system 16% cpu 53.759 total ( cd /mnt/tmp/src/ && tar pxf - ) 0.68s user 12.58s system 24% cpu 53.757 total Hmmm. That took slightly longer. Let's try "cpio". # time (cd /usr/local/src/ && find cvs.gnome.org -print0 | cpio -p0 /mnt/tmp/src) 387800 blocks ( cd /usr/local/src/ && find cvs.gnome.org -print0 | cpio -p0 /mnt/tmp/src ) 0.62s user 20.14s system 33% cpu 1:01.40 total [EMAIL PROTECTED]:~ # rm -rf /mnt/tmp/cvs.gnome.org That was a lot slower. Both "find" and "cpio" must stat every file. There is no benefit to having two processes at work here. Let's try something else. I seem to recall seeing some kind of buffering program meant for use when copying things across the network or to a tape drive using "tar", one time when I ran "dselect" and browsed the great plethora of available software packages... A quick "apt-cache search 'buffer'" gives me a 92 line list, from which I choose the one I need: [EMAIL PROTECTED]:~ # apt-get install 'buffer' Reading Package Lists... Done Building Dependency Tree... Done The following NEW packages will be installed: buffer 0 packages upgraded, 1 newly installed, 0 to remove and 3 not upgraded. Need to get 12.6kB of archives. After unpacking 77.8kB will be used. Get:1 http://zeus.kernel.org unstable/main buffer 1.19-1 [12.6kB] Fetched 12.6kB in 0s (17.9kB/s) Selecting previously deselected package buffer. (Reading database ... 189510 files and directories currently installed.) Unpacking buffer (from .../buffer_1.19-1_i386.deb) ... Setting up buffer (1.19-1) ... [EMAIL PROTECTED]:~ # buffer --help buffer: invalid option -- - Usage: buffer [-B] [-t] [-S size] [-m memsize] [-b blocks] [-p percent] [-s blocksize] [-u pause] [-i infile] [-o outfile] [-z size] -B = blocked device - pad out last block -t = show total amount written at end -S size = show amount written every size bytes -m size = size of shared mem chunk to grab -b num = number of blocks in queue -p percent = don't start writing until percent blocks filled -s size = size of a block -u usecs = microseconds to sleep after each write -i infile = file to read from -o outfile = file to write to -z size = combined -S/-s flag Ok, let's try it... [EMAIL PROTECTED]:~ # time (cd /usr/local/src/ && tar pcf - cvs.gnome.org) | buffer -m 8m | (cd /mnt/tmp/src/ && tar pxf -) ( cd /usr/local/src/ && tar pcf - cvs.gnome.org ) 0.55s user 6.11s system 12% cpu 53.905 total buffer -m 8m 0.14s user 2.10s system 4% cpu 53.914 total ( cd /mnt/tmp/src/ && tar pxf - ) 0.84s user 16.51s system 32% cpu 53.910 total [EMAIL PROTECTED]:~ # rm -rf /mnt/tmp/src/cvs.gnome.org [EMAIL PROTECTED]:~ # time (cd /usr/local/src/ && tar pcf - cvs.gnome.org) | buffer -m 8m -p 75 | (cd /mnt/tmp/src/ && tar pxf -) ( cd /usr/local/src/ && tar pcf - cvs.gnome.org ) 0.72s user 3.82s system 11% cpu 39.447 total buffer -m 8m -p 75 0.15s user 2.39s system 6% cpu 39.544 total ( cd /mnt/tmp/src/ && tar pxf - ) 0.59s user 12.07s system 32% cpu 39.539 total Wow! Not bad, huh? Filesystem Size Used Avail Use% Mounted on /dev/ide/host0/bus0/target0/lun0/part3 27G 16G 11G 59% / /dev/ide/host0/bus0/target0/lun0/part1 29M 6.3M 21M 23% /boot shm 2.8G 0 2.8G 0% /var/shm /dev/md/0 55G 234M 55G 1% /mnt/tmp YMMV, since: # hdparm -t /dev/hda3 /dev/md/0 /dev/hda3: Timing buffered disk reads: 64 MB in 2.50 seconds = 25.60 MB/sec /dev/md/0: Timing buffered disk reads: 64 MB in 1.10 seconds = 58.18 MB/sec ... the RAID0 (software raid 0 on UDMA 100 EIDE) destination is much faster than the source filesystem. That is why filling the buffer before starting to write helped the timing so much. In this case, having more than one process at work is beneficial. The situation between the "find | cpio" case and the "tar c | buffer | tar x" case seems analagous to what we do in that if you just point out the bugs, it takes longer for them to get fixed than if you submit a patch. Can you see what I mean by that? In "find | cpio", "find" is just walking the filesystem handing file names off to "cpio" who must then stat and read each file itself, and then also write it back out to the new location. In the "tar c | buffer | tar x" case though, the "tar c" is making its own list of files, then packing them up and piping the whole bundle off to the buffer (our BTS?), where it is then ready to be unpacked by the "tar x". Hmmm. "cpio" doesn't know how to find, it just knows how to archive or copy through... Many of you don't know how to fix the code when you find a bug, yet. Nor do I. Often enough it's way over my head. Often enough the BTS already contains a report about the bug I just found. :-) It's late and I'm rambling and I don't feel like editting this story any longer. Just thought I'd share my findings. Hope it helps someone. -- Karl M. Hegbloom mailto: [EMAIL PROTECTED]