On 01/ 5/10 10:01 AM, Richard Elling wrote:
How are the files named?  If you know something about the filename
pattern, then you could create subdirs and mv large numbers of files
to reduce the overall size of a single directory.  Something like:

    mkdir .A
    mv A* .A
    mkdir .B
    mv B* .B
    ...

I doubt that would be a faster option. Unless you can be certain the file naming coincides with the unsorted order of the files in the directory. Because if A* does not occur the beginning of the directory's contents, finding them will be painful. The above process would add many cycles of scanning through all 60 million directory entries. And each move request will churn the vnode cache.


A while back I did some experimenting with millions of files per directory. Note that the time estimates are overstating how long it will take. The more files you remove, the faster it will go. I would be trying to get an unsorted read of the directory, and delete them in that order. This is not just to save the time it takes to sort the output. It will also mimimize vnode cache churn, and the time to remove each object. Each remove request must iterate the directory looking for the object to remove. Newer ON builds support the -U option to ls, for unsorted output. I don't know what may exist on S10. FWIW, I copied the 'ls' binary from a ON128 machine to /tmp/myls a S10 machine, and it appeared to work - I don't know if there are any issues/risks with doing that.


Since its a niagara system, it might go faster if you can get multiple removes going in parallel. But only if you all the parallel remove requests can be on files near the beginning of the directory's contents. If you can't get an unsorted list of files, then multiple threads will just add to the vnode cache thrashing.


It might be worth trying something like this:
ls -U > remove.sh
Make it a bash script.
prepend rm -f to each line, and append an & to each line.
Maybe every few hundred lines put in a wait. (in case the rm's can be kicked off significantly faster than they can be completed, you don't want millions of rm's to get started)

You'll have to wait on ls to do one unsorted read of the directory. Then you will get parallel remove requests going, and always on files at the beginning of the directory. There should be minimal vnode churn during the removes.

Starting the new processes for the removes may counteract the benefit of parallelizing, and make this slower. But since its a Niagara system, you may have the spare cpu cycles to waste anyway. Its just another idea to try...
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to