Re: fts: do not exhaust memory when processing million-entry directory

Pádraig Brady Thu, 18 Aug 2011 16:33:58 -0700

On 08/18/2011 09:14 PM, Jim Meyering wrote:
> Pádraig Brady wrote:
>>> With the default threshold of 100,000, the maximum is under 30MB
>>> and slightly faster than the first run:
> ...
>>>  0 
>>> +----------------------------------------------------------------------->Gi
>>>    0                                                                   4.481
>>
>> Thanks for doing all this.
>> So the above is counting instructions rather than time?
> 
> Yes.
> You can change that with this valgrind option:
> 
>        --time-unit=<i|ms|B> [default: i]
>            The time unit used for the profiling. There are three
>            possibilities: instructions executed (i), which is good for most
>            cases; real (wallclock) time (ms, i.e. milliseconds), which is
>            sometimes useful; and bytes allocated/deallocated on the heap
>            and/or stack (B), which is useful for very short-run programs, and
>            for testing purposes, because it is the most reproducible across
>            different machines.
> 
> 
>> 30MB is still a large chunk of mem and typically will nuke a cache level,
>> or be much slower on a mem constrained system.
> 
> I'd be comfortable with a threshold of 50,000 (i.e ~15MiB),
> but don't think we need to worry about efficiency when
> memory constrained for such an extreme case.
> For now, my goal is to make the tools more resistant to malfunction
> when given an extreme hierarchy.  Efficiency can come later ;-)
> 
>> Perhaps it might be better to try to keep less than 1MB or so,
>> so as to be possibly more cache efficient, which seems
>> might not impact the number of instructions that much.
>> The output of `perf stat --repeat=5 find ...` would be interesting.
>> I'll try it on my sandy bridge laptop tomorrow evening
>> if you've not got sufficient hardware counters.
> 
> Another factor is seek cost vs. the ext3/4 problem that we
> work around by sorting on inode numbers.  The more you subdivide
> and sort independently the more seek-inducing reads you'll have.
> At least that seems plausible.  I haven't tested enough on spinning
> media.  Note that we don't sort when there are fewer than 10,000
> entries.


Oops right, 1MiB is probably too low then.
Better do everything possible to minimize random access
which even on SSDs can be relatively slow.

> The above *might* be a problem, but the following is a show-stopper.
> Reducing to 1MiB of memory usage would mean using a threshold of about
> 3000 directory entries.  IMHO, that is too low.  That means a hierarchy
> like this (with the "d"s being directories and all the others files,
> 
>   d
>   |\       \
>   d 2 3 ... 5000
>   |\       \
>   d 2 3 ... 5000
>   |\       \
>   d 2 3 ... 5000
>   |\       \
>   d 2 3 ... 5000
>   |\       \
>   d 2 3 ... 5000
>   |\       \
>   d 2 3 ... 5000
>   ...
> 
> fts would require one DIR* per level in a depth-first traversal.
> Thus, a tree of depth a little over 1000 would exhaust available
> file descriptors on most systems. (not tested, yet)
> 
> Do we care if someone does the same thing, but with
> 100k-entry directories?  That comes to over 10 million files.
> 
> Increase the threshold to 1M, bumping the limit to 100M files?
> 
> One possibility is to make the threshold dynamic, balancing
> memory vs. file descriptors.  I.e., start with say 50K, but
> increase the threshold if/as the in-use DIR* count grows.
> 
> Which would you rather exhaust first: memory or file descriptors?

file descriptors a usually more constrained so I'd
lean towards using more memory.

> 
> This is the first time in a long time that I've thought back
> to the pre-fts version of GNU rm.  It did not have this limitation.
> Unlike most other fts clients, rm can avoid nearly all of the overhead
> of recording where it is in the readdir entry stream for each level
> of a hierarchy.  All it had to do was to keep track of entries-per-level
> that it has failed to remove.

cheers,
Pádraig.

Re: fts: do not exhaust memory when processing million-entry directory

Reply via email to