On Thu, Apr 23, 2009 at 1:06 PM, George Marselis <[email protected]> wrote: > Hey guys, thanks for all the hard work. I'm working as a sysadmin for a shop > that specializes in Debian GNU/Linux. > > i got a directory with a couple of tens of milions of files.
It's not such a great idea to do that. Far better to chop that up into at least 1000 subdirectories. But take care not to use more than about 32,000 subdirectories, as some Linux filesystems either don't allow it (ext3) or some operations become less efficient at that point (ext4's st_nlink changes meaning). > I was trying to > find the median access time of the files in that director and sort by > percentiles. i got a little python script together, but i can't help > thinking that this is a feature that will be needed in the future, with > bigger filesystems. I'm not sure determine_median_access_time has such a big potential user base, to be honest. However, I'm not sure what your performance requirements are, but if performance is an issue, bear in mind that there are several partitioning algorithms that allow you to find the median of a dataset without fully sorting it. James. _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
