On Wed, Mar 25, 2009 at 03:10:42PM +0000, Tzafrir Cohen wrote: > On Wed, Mar 25, 2009 at 07:53:06AM -0500, Ron Johnson wrote: > > On 2009-03-25 05:16, Tzafrir Cohen wrote: > >>> Tapani Tarvainen schrieb: > >>>>> kj wrote: > >>>>>> Now, I've been running the usual find . -type f -exec rm {} \; > >>>>>> but this is going at about 700,000 per day. Would simply doing > >>>>>> an rm -rf on the Maildir be quicker? Or is there a better > >>>>>> way? > >>>> While rm -rf would certainly be quicker and is obviously preferred > >>>> when you want to remove everything in the directory, the find version > >>>> could be speeded significantly by using xargs, i.e., > >>>> find . -type f -print0 | xargs -0 rm > >>>> This is especially useful if you want to remove files selecticely > >>>> instead of everything at once. > >> > >> And this requires traversing the directory not just a single time but 4 > >> milion times (once per rm process). Not to mention merely spawning 4 > >> milion such processes is not fun (but spawning those would probably fit > >> nicely within 10 minutes or so) > > > > But isn't that (preventing the spawning of 4M processes) the reason why > > xargs was created? > > $ for i in `seq 4000000`; echo hi; done | xargs /bin/echo | wc > 252 4000000 12000000 ... > So it's indeed not 4M processes, but still quite a few. But wrost: > you're traversing the directory many times. And you're telling rm in > which explicit order to remove files, rather than simply the native > order of the files in the directory (or whatever is convinient for the > implementor). Which probably requires rm a number of extra lookups in > the directory.
can you explain what you mean by "traversing"? i haven't confirmed with strace, but i assume the only process doing open(".", O_DIRECTORY) and getdents is the single find process. then, each of the (approx 1000) rm processes are making about 4000 unlinks. syscall-wise, the only differences between rm -r and find | xargs rm would be the 4000 extra forks and a bunch of writes and reads of the list of filenames from the pipe. compared to the 4000000 unlinks in either case, that overhead hardly seems like the wrost part ;) unless your filesystem has an optimization for removing subtrees and your tool knows to ask for it, i'd guess you're probably spending most of your time waiting for the filesystem to remove entries and invalidate caches. --Rob* -- /-------------------------------------------------------------\ | "If we couldn't laugh we would all go insane" | | --Jimmy Buffett, | | "Changes in Latitudes, Changes in Attitudes" | \-------------------------------------------------------------/ -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org