Re: fts: do not exhaust memory when processing million-entry directory

Jim Meyering Thu, 18 Aug 2011 13:14:54 -0700

Pádraig Brady wrote:
>> With the default threshold of 100,000, the maximum is under 30MB
>> and slightly faster than the first run:
...
>>  0 
>> +----------------------------------------------------------------------->Gi
>>    0                                                                   4.481
>
> Thanks for doing all this.
> So the above is counting instructions rather than time?


Yes.
You can change that with this valgrind option:

       --time-unit=<i|ms|B> [default: i]
           The time unit used for the profiling. There are three
           possibilities: instructions executed (i), which is good for most
           cases; real (wallclock) time (ms, i.e. milliseconds), which is
           sometimes useful; and bytes allocated/deallocated on the heap
           and/or stack (B), which is useful for very short-run programs, and
           for testing purposes, because it is the most reproducible across
           different machines.


> 30MB is still a large chunk of mem and typically will nuke a cache level,
> or be much slower on a mem constrained system.

I'd be comfortable with a threshold of 50,000 (i.e ~15MiB),
but don't think we need to worry about efficiency when
memory constrained for such an extreme case.
For now, my goal is to make the tools more resistant to malfunction
when given an extreme hierarchy.  Efficiency can come later ;-)

> Perhaps it might be better to try to keep less than 1MB or so,
> so as to be possibly more cache efficient, which seems
> might not impact the number of instructions that much.
> The output of `perf stat --repeat=5 find ...` would be interesting.
> I'll try it on my sandy bridge laptop tomorrow evening
> if you've not got sufficient hardware counters.

Another factor is seek cost vs. the ext3/4 problem that we
work around by sorting on inode numbers.  The more you subdivide
and sort independently the more seek-inducing reads you'll have.
At least that seems plausible.  I haven't tested enough on spinning
media.  Note that we don't sort when there are fewer than 10,000
entries.

The above *might* be a problem, but the following is a show-stopper.
Reducing to 1MiB of memory usage would mean using a threshold of about
3000 directory entries.  IMHO, that is too low.  That means a hierarchy
like this (with the "d"s being directories and all the others files,

  d
  |\       \
  d 2 3 ... 5000
  |\       \
  d 2 3 ... 5000
  |\       \
  d 2 3 ... 5000
  |\       \
  d 2 3 ... 5000
  |\       \
  d 2 3 ... 5000
  |\       \
  d 2 3 ... 5000
  ...

fts would require one DIR* per level in a depth-first traversal.
Thus, a tree of depth a little over 1000 would exhaust available
file descriptors on most systems. (not tested, yet)

Do we care if someone does the same thing, but with
100k-entry directories?  That comes to over 10 million files.

Increase the threshold to 1M, bumping the limit to 100M files?

One possibility is to make the threshold dynamic, balancing
memory vs. file descriptors.  I.e., start with say 50K, but
increase the threshold if/as the in-use DIR* count grows.

Which would you rather exhaust first: memory or file descriptors?

This is the first time in a long time that I've thought back
to the pre-fts version of GNU rm.  It did not have this limitation.
Unlike most other fts clients, rm can avoid nearly all of the overhead
of recording where it is in the readdir entry stream for each level
of a hierarchy.  All it had to do was to keep track of entries-per-level
that it has failed to remove.

Re: fts: do not exhaust memory when processing million-entry directory

Reply via email to