On 08/18/2011 09:14 PM, Jim Meyering wrote: > Pádraig Brady wrote: >>> With the default threshold of 100,000, the maximum is under 30MB >>> and slightly faster than the first run: > ... >>> 0 >>> +----------------------------------------------------------------------->Gi >>> 0 4.481 >> >> Thanks for doing all this. >> So the above is counting instructions rather than time? > > Yes. > You can change that with this valgrind option: > > --time-unit=<i|ms|B> [default: i] > The time unit used for the profiling. There are three > possibilities: instructions executed (i), which is good for most > cases; real (wallclock) time (ms, i.e. milliseconds), which is > sometimes useful; and bytes allocated/deallocated on the heap > and/or stack (B), which is useful for very short-run programs, and > for testing purposes, because it is the most reproducible across > different machines. > > >> 30MB is still a large chunk of mem and typically will nuke a cache level, >> or be much slower on a mem constrained system. > > I'd be comfortable with a threshold of 50,000 (i.e ~15MiB), > but don't think we need to worry about efficiency when > memory constrained for such an extreme case. > For now, my goal is to make the tools more resistant to malfunction > when given an extreme hierarchy. Efficiency can come later ;-) > >> Perhaps it might be better to try to keep less than 1MB or so, >> so as to be possibly more cache efficient, which seems >> might not impact the number of instructions that much. >> The output of `perf stat --repeat=5 find ...` would be interesting. >> I'll try it on my sandy bridge laptop tomorrow evening >> if you've not got sufficient hardware counters. > > Another factor is seek cost vs. the ext3/4 problem that we > work around by sorting on inode numbers. The more you subdivide > and sort independently the more seek-inducing reads you'll have. > At least that seems plausible. I haven't tested enough on spinning > media. Note that we don't sort when there are fewer than 10,000 > entries.
Oops right, 1MiB is probably too low then. Better do everything possible to minimize random access which even on SSDs can be relatively slow. > The above *might* be a problem, but the following is a show-stopper. > Reducing to 1MiB of memory usage would mean using a threshold of about > 3000 directory entries. IMHO, that is too low. That means a hierarchy > like this (with the "d"s being directories and all the others files, > > d > |\ \ > d 2 3 ... 5000 > |\ \ > d 2 3 ... 5000 > |\ \ > d 2 3 ... 5000 > |\ \ > d 2 3 ... 5000 > |\ \ > d 2 3 ... 5000 > |\ \ > d 2 3 ... 5000 > ... > > fts would require one DIR* per level in a depth-first traversal. > Thus, a tree of depth a little over 1000 would exhaust available > file descriptors on most systems. (not tested, yet) > > Do we care if someone does the same thing, but with > 100k-entry directories? That comes to over 10 million files. > > Increase the threshold to 1M, bumping the limit to 100M files? > > One possibility is to make the threshold dynamic, balancing > memory vs. file descriptors. I.e., start with say 50K, but > increase the threshold if/as the in-use DIR* count grows. > > Which would you rather exhaust first: memory or file descriptors? file descriptors a usually more constrained so I'd lean towards using more memory. > > This is the first time in a long time that I've thought back > to the pre-fts version of GNU rm. It did not have this limitation. > Unlike most other fts clients, rm can avoid nearly all of the overhead > of recording where it is in the readdir entry stream for each level > of a hierarchy. All it had to do was to keep track of entries-per-level > that it has failed to remove. cheers, Pádraig.