On Mon, Jan 1, 2018 at 8:40 AM, Pádraig Brady <[email protected]> wrote: > tag 29921 notabug > close 29921 > stop > > On 31/12/17 21:07, Niklas Hambüchen wrote: >> Hello, >> >> in commit >> >> rm -r: avoid O(n^2) performance for a directory with very many entries >> http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a >> >> it says that `rm -r` >> >> "now displays linear performance, even when operating on million-entry >> directories on ext3 and ext4" >> >> I found this not to be the case on my systems running ext4 on Linux 4.9 >> on a WD 4TB spinning disk, coreutils rm 8.28. >> As reported on https://bugs.python.org/issue32453#msg309303, I got: >> >> nfiles real user sys >> >> 100000 0.51s 0.07s 0.43s >> 200000 2.46s 0.15s 0.89s >> 400000 10.78s 0.26s 2.21s >> 800000 44.72s 0.58s 6.03s >> 1600000 180.37s 1.06s 10.70s >> >> Each 2x increase of number of files results in 4x increased deletion >> time, making for a clean O(n^2) quadratic curve. >> >> I'm testing this with: >> >> set -e >> rm -rf dirtest/ >> echo 100000 && (mkdir dirtest && cd dirtest && seq 1 100000 | xargs >> touch) && time rm -r dirtest/ >> echo 200000 && (mkdir dirtest && cd dirtest && seq 1 200000 | xargs >> touch) && time rm -r dirtest/ >> echo 400000 && (mkdir dirtest && cd dirtest && seq 1 400000 | xargs >> touch) && time rm -r dirtest/ >> echo 800000 && (mkdir dirtest && cd dirtest && seq 1 800000 | xargs >> touch) && time rm -r dirtest/ >> echo 1600000 && (mkdir dirtest && cd dirtest && seq 1 1600000 | xargs >> touch) && time rm -r dirtest/ >> >> >> On another system, Ubuntu 16.04, coretuils rm 8.25, with Linux 4.10 on >> ext4, I get: >> >> nfiles real user sys >> >> 100000 0.94s 0.06s 0.516s >> 200000 2.94s 0.10s 1.060s >> 400000 10.88s 0.30s 2.508s >> 800000 34.60s 0.48s 4.912s >> 1600000 203.87s 0.99s 11.708s >> >> Also quadratic. >> >> Same machine on XFS: >> >> nfiles real user sys >> >> 100000 3.37s 0.04s 0.98s >> 200000 7.20s 0.06s 2.03s >> 400000 22.52s 0.16s 5.11s >> 800000 53.31s 0.37s 11.46s >> 1600000 200.83s 0.76s 22.41s >> >> Quadratic. >> >> What is the complexity of `rm -r` supposed to be? >> Was it really linear in the past as the patch describes, and this is a >> regression? >> Can we make it work linearly again? > > Note the patch mentioned above, sorts by inode order, > which is usually a win, avoiding random access across the device. > > The user and sys components of the above are largely linear, > so the increasing delay is waiting on the device. > I presume that's coming from increased latency effects > as the cached operations are sync'd more to the device > with increasing numbers of operations. > There are many Linux kernel knobs for adjusting > caching behaviour in the io scheduler. > > To confirm I did a quick check on a ramdisk (/dev/shm) > to minimize the latency effects, and that showed largely > linear behaviour in the real column also. > > Note if you want to blow away a whole tree quite often, > it may be more efficient to recreate the file system > on a separate (loopback) device.
Thanks for the report. As Pádraig suggests, that looks like it may well represent an opportunity to improve cache tuning. All the same, I wanted to confirm that the heuristic of sorting by inode number is still valuable and ran this test: I rebuilt "rm" with lib/fts.c changed to effectively disable that sorting (i.e., never sort, instead of "sort when there are at least 10,000 entries"): -# define FTS_INODE_SORT_DIR_ENTRIES_THRESHOLD 10000 +# define FTS_INODE_SORT_DIR_ENTRIES_THRESHOLD 100000000 Then I ran this test on a Fedora 27, 4.14.6-300.fc27.x86_64 system using an ext4-formatted spinning disk (samsung HD204UI): (btw, "time" here is GNU time, and %e prints elapsed): $ i=50000; while :; do i=$[i*2]; case $i in 16*) break;; esac; mkdir x; (cd x && seq $i|xargs touch); env time -f "%e $i" /x/cu/src/rm -rf x; done 0.49 100000 16.14 200000 170.98 400000 523.55 800000 As you can see, removing that heuristic incurs a very large penalty. Rerunning those examples with stock rm shows the quadratic behavior you noted. $ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2]; done 0.48 100000 2.59 200000 12.69 400000 35.61 800000 $ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2]; done 0.48 100000 2.52 200000 11.53 400000 35.86 800000 $ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2]; done 0.48 100000 2.55 200000 11.49 400000 38.86 800000 Even on an SSD (same system as above, with a 3-yr-old SSD), I see a super-linear trend of O(N^1.5): $ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env time -f "%e $i" env rm -rf x; case $i in 32*) break;; esac; i=$[i*2]; done 0.48 100000 1.26 200000 3.90 400000 10.56 800000 28.13 1600000 58.39 3200000 Note also that there is a test (tests/rm/ext3-perf.sh) that is intended to cover this situation. However, back then, I made the error of using an absolute threshold, and with subsequent general system performance increases, it has become next to useless. We should probably invest in updating that "expensive" test to use relative comparisons, as done in grep's tests/long-pattern-perf (https://git.savannah.gnu.org/cgit/grep.git/tree/tests/long-pattern-perf) and tests/mb-non-UTF8-performance. Yes, I was measuring elapsed wall-clock time back when I made those changes to rm and fts. So I do agree that this is a problem, but I doubt we can do anything in the coreutils to fix it.
