bug#29921: O(n^2) performance of rm -r

Jim Meyering Mon, 01 Jan 2018 13:10:04 -0800

On Mon, Jan 1, 2018 at 8:40 AM, Pádraig Brady <[email protected]> wrote:
> tag 29921 notabug
> close 29921
> stop
>
> On 31/12/17 21:07, Niklas Hambüchen wrote:
>> Hello,
>>
>> in commit
>>
>>   rm -r: avoid O(n^2) performance for a directory with very many entries
>>   http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a
>>
>> it says that `rm -r`
>>
>>   "now displays linear performance, even when operating on million-entry
>> directories on ext3 and ext4"
>>
>> I found this not to be the case on my systems running ext4 on Linux 4.9
>> on a WD 4TB spinning disk, coreutils rm 8.28.
>> As reported on https://bugs.python.org/issue32453#msg309303, I got:
>>
>>  nfiles     real   user     sys
>>
>>  100000    0.51s  0.07s   0.43s
>>  200000    2.46s  0.15s   0.89s
>>  400000   10.78s  0.26s   2.21s
>>  800000   44.72s  0.58s   6.03s
>> 1600000  180.37s  1.06s  10.70s
>>
>> Each 2x increase of number of files results in 4x increased deletion
>> time, making for a clean O(n^2) quadratic curve.
>>
>> I'm testing this with:
>>
>>   set -e
>>   rm -rf dirtest/
>>   echo  100000 && (mkdir dirtest && cd dirtest && seq 1  100000 | xargs
>> touch) && time rm -r dirtest/
>>   echo  200000 && (mkdir dirtest && cd dirtest && seq 1  200000 | xargs
>> touch) && time rm -r dirtest/
>>   echo  400000 && (mkdir dirtest && cd dirtest && seq 1  400000 | xargs
>> touch) && time rm -r dirtest/
>>   echo  800000 && (mkdir dirtest && cd dirtest && seq 1  800000 | xargs
>> touch) && time rm -r dirtest/
>>   echo 1600000 && (mkdir dirtest && cd dirtest && seq 1 1600000 | xargs
>> touch) && time rm -r dirtest/
>>
>>
>> On another system, Ubuntu 16.04, coretuils rm 8.25, with Linux 4.10 on
>> ext4, I get:
>>
>>  nfiles     real   user      sys
>>
>>  100000    0.94s  0.06s   0.516s
>>  200000    2.94s  0.10s   1.060s
>>  400000   10.88s  0.30s   2.508s
>>  800000   34.60s  0.48s   4.912s
>> 1600000  203.87s  0.99s  11.708s
>>
>> Also quadratic.
>>
>> Same machine on XFS:
>>
>>  nfiles     real   user     sys
>>
>>  100000    3.37s  0.04s   0.98s
>>  200000    7.20s  0.06s   2.03s
>>  400000   22.52s  0.16s   5.11s
>>  800000   53.31s  0.37s  11.46s
>> 1600000  200.83s  0.76s  22.41s
>>
>> Quadratic.
>>
>> What is the complexity of `rm -r` supposed to be?
>> Was it really linear in the past as the patch describes, and this is a
>> regression?
>> Can we make it work linearly again?
>
> Note the patch mentioned above, sorts by inode order,
> which is usually a win, avoiding random access across the device.
>
> The user and sys components of the above are largely linear,
> so the increasing delay is waiting on the device.
> I presume that's coming from increased latency effects
> as the cached operations are sync'd more to the device
> with increasing numbers of operations.
> There are many Linux kernel knobs for adjusting
> caching behaviour in the io scheduler.
>
> To confirm I did a quick check on a ramdisk (/dev/shm)
> to minimize the latency effects, and that showed largely
> linear behaviour in the real column also.
>
> Note if you want to blow away a whole tree quite often,
> it may be more efficient to recreate the file system
> on a separate (loopback) device.


Thanks for the report.
As Pádraig suggests, that looks like it may well represent an
opportunity to improve cache tuning. All the same, I wanted to confirm
that the heuristic of sorting by inode number is still valuable and
ran this test:

I rebuilt "rm" with lib/fts.c changed to effectively disable that
sorting (i.e., never sort, instead of "sort when there are at least
10,000 entries"):

-# define FTS_INODE_SORT_DIR_ENTRIES_THRESHOLD 10000
+# define FTS_INODE_SORT_DIR_ENTRIES_THRESHOLD 100000000

Then I ran this test on a Fedora 27, 4.14.6-300.fc27.x86_64 system
using an ext4-formatted spinning disk (samsung HD204UI): (btw, "time"
here is GNU time, and %e prints elapsed):

$ i=50000; while :; do i=$[i*2]; case $i in 16*) break;; esac; mkdir
x; (cd x && seq $i|xargs touch); env time -f "%e $i" /x/cu/src/rm -rf
x; done
0.49 100000
16.14 200000
170.98 400000
523.55 800000

As you can see, removing that heuristic incurs a very large penalty.

Rerunning those examples with stock rm shows the quadratic behavior you noted.

$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2];
done
0.48 100000
2.59 200000
12.69 400000
35.61 800000
$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2];
done
0.48 100000
2.52 200000
11.53 400000
35.86 800000
$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2];
done
0.48 100000
2.55 200000
11.49 400000
38.86 800000

Even on an SSD (same system as above, with a 3-yr-old SSD), I see a
super-linear trend of O(N^1.5):

$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
time -f "%e $i" env rm -rf x; case $i in 32*) break;; esac; i=$[i*2];
done
0.48 100000
1.26 200000
3.90 400000
10.56 800000
28.13 1600000
58.39 3200000

Note also that there is a test (tests/rm/ext3-perf.sh) that is
intended to cover this situation. However, back then, I made the error
of using an absolute threshold, and with subsequent general system
performance increases, it has become next to useless. We should
probably invest in updating that "expensive" test to use relative
comparisons, as done in grep's tests/long-pattern-perf
(https://git.savannah.gnu.org/cgit/grep.git/tree/tests/long-pattern-perf)
and tests/mb-non-UTF8-performance.

Yes, I was measuring elapsed wall-clock time back when I made those
changes to rm and fts. So I do agree that this is a problem, but I
doubt we can do anything in the coreutils to fix it.

bug#29921: O(n^2) performance of rm -r

Reply via email to