On Wed, Mar 25, 2009 at 03:10:42PM +0000, Tzafrir Cohen (tzaf...@cohens.org.il) wrote:
> $ for i in `seq 4000000`; do echo something quite longer; done | xargs > /bin/echo | wc > 756 12000000 92000000 [...] > So it's indeed not 4M processes, but still quite a few. Even 756 is much less than 4M. > But wrost: > you're traversing the directory many times. And you're telling rm in > which explicit order to remove files, rather than simply the native > order of the files in the directory (or whatever is convinient for the > implementor). Which probably requires rm a number of extra lookups in > the directory. Interesting point; I hadn't thought of that. How much fork() costs in comparison to reading a directory entry, well that'd depend on things like disk and cpu speed, available memory, filesystem type &c. To get an idea which way it falls I did a quick test with 500k files (created by seq 500000 | xargs touch) on my box. First on an ext3 filesystem: rm -rf testd 4m11.909s find testd -type f |xargs rm 4m42.025s find testd -type rm -exec rm {} \; 62m59.030s find testd -type f -delete 4m19.340s Then on tmpfs: rm -rf testd 0m2.507s find testd -type f |xargs rm 0m6.318s find testd -type rm -exec rm {} \; 58m34.645s find testd -type f -delete 0m3.362s So, it would seem the number of rm calls indeed dominates the time needed, not directory traversal. Of course here xargs was helped by the fact that filenames were short (most 12 characters with the directory name), but the speedup over -exec is still rather impressive. If anyone can come up with a scenario where -exec is significantly faster than xargs, I'd be interested. -- Tapani Tarvainen -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org