Hello Jim and Paul, thank you very much for verifying my observations.
Also, thanks Jim for taking the time to make the nature of the issue more precise: Your patch still helps a lot with big big constant factors but doesn't relieve us of the overall O(n²) problem. I can also confirm the super-linear performance on SSDs you mentioned. On my systems, I get roughly: O(n^2 ) on spinning disks O(n^1.7) on SSDs I see this on ext4 on SSDs and spinning disks; also on xfs and zfs on spinning disks but I haven't tried those two on SATA SSDs or NVMe yet. @ Jim, if you your NVMe device available for that, could you check all three of those filesystems on it? If it happens to be linear on NVMe on all filesystem types, that might help narrow down the issue. > I doubt we can do anything in the coreutils to fix it. I guess that depends on what we will find the root cause of the issue to be: If there is a favourable deletion order that makes the underlying systems behave linearly, then `rm` could try to produce that order in the same spirit as your previous patch does. To go for a simple exmaple: If the filesystem implemented the directory lists as a balanced binary tree, then deleting leaves evenly spread across the width of the tree should avoid any rebalancing and result in O(log n) for a single unlink(). > perhaps we should file an ext4 bug report? OK, I will do that right now. Naively, I would expect that a single unlink() can always be implemented in amortised O(1) or at least O(log n), so that we should end up with O(n) or O(n * log n) overall. Another possibility (though I find it unlikely) is that deletion is in fact O(n), but that the linear behaviour only appears once we get to much larger n. It seems worthwhile running a deletion loop with ever-increasing n = n*2 over night. Thanks!
