bug#24937: "deleting unused links" GC phase is too slow

2021-11-16 Thread Ludovic Courtès
Hi, Ludovic Courtès skribis: > Files smaller than 4 KiB typically represent ~60% of the entries in > /gnu/store/.links but only contribute to ~2.5% of the space savings > afforded by deduplication. > > Not considering these files for deduplication speeds up file insertion > in the store and, mor

bug#24937: "deleting unused links" GC phase is too slow

2021-11-13 Thread Ludovic Courtès
Ludovic Courtès skribis: > In a nutshell: > > • Files < 1KiB contribute to 0.3% of the space savings; > > • Files < 4KiB contribute to 2.5% of the space savings; I get similar results on bayfront.guix.gnu.org (with 3.2M entries): … and on guix.bordeaux.inria.fr (2.0M entries): Files < 4K

bug#24937: "deleting unused links" GC phase is too slow

2021-11-13 Thread Ludovic Courtès
Hi, Maxim Cournoyer skribis: > I haven't done any analysis, just grabbed the result, but here it what > it looks for me: There’s a bit more than 35% of deduplicated files that are < 1KiB, and not much to be gained by deduplicating them. On IRC several people shared the results on their machine

bug#24937: "deleting unused links" GC phase is too slow

2021-11-09 Thread Ludovic Courtès
Ludovic Courtès skribis: > On my laptop, we’re talking about space savings of 325 MiB, a tiny > fraction of my store: > > scheme@(guile-user)> (saved-space (filter (lambda (file) > (< (deduplicated-file-size file) > 1024)) >

bug#24937: "deleting unused links" GC phase is too slow

2021-11-09 Thread Ludovic Courtès
Hi! l...@gnu.org (Ludovic Courtès) skribis: > ‘LocalStore::removeUnusedLinks’ traverses all the entries in > /gnu/store/.links and calls lstat(2) on each one of them and checks > ‘st_nlink’ to determine whether they can be deleted. > > There are two problems: lstat(2) can be slow on spinning disk

bug#24937: "deleting unused links" GC phase is too slow

2020-04-17 Thread Ricardo Wurmus
Ludovic Courtès writes: >> root@hydra-guix-127 ~# ls -1 /gnu/store/.links | wc -l >> 2017395 > > That’s not a lot, my laptop has 2.8M links. Let me rerun this after copying a few thousand store items from ci.guix.gnu.org over. Maybe we’ll see the different times diverge then. -- Ricardo

bug#24937: "deleting unused links" GC phase is too slow

2020-04-17 Thread Ludovic Courtès
Hi Ricardo, Thanks for running this benchmark! Ricardo Wurmus skribis: > root@hydra-guix-127 ~# ls -1 /gnu/store/.links | wc -l > 2017395 That’s not a lot, my laptop has 2.8M links. It’s interesting to see that system time remains at ~4.2s in all modes. So the only thing that modes 2 and

bug#24937: "deleting unused links" GC phase is too slow

2020-04-16 Thread Ricardo Wurmus
Here are more benchmarks on one of the build nodes. It doesn’t nearly have as many used inodes as ci.guix.gnu.org, but I could fill it up if necessary. root@hydra-guix-127 ~# df -i /gnu/ Filesystem Inodes IUsedIFree IUse% Mounted on /dev/sda3 28950528 2796829 26153699 10%

bug#24937: "deleting unused links" GC phase is too slow

2020-04-16 Thread Ricardo Wurmus
Ricardo Wurmus writes: > Ludovic Courtès writes: > >> Ricardo, Roel: would you be able to run that links-traversal.c from >> >> on a machine with a big store, as described at >>

bug#24937: "deleting unused links" GC phase is too slow

2016-12-14 Thread Mark H Weaver
I apologize for losing my patience earlier. Mark

bug#24937: "deleting unused links" GC phase is too slow

2016-12-13 Thread Ricardo Wurmus
Ludovic Courtès writes: > Ricardo, Roel: would you be able to run that links-traversal.c from > > on a machine with a big store, as described at >

bug#24937: "deleting unused links" GC phase is too slow

2016-12-13 Thread Ludovic Courtès
Hello Mark, Mark H Weaver skribis: > l...@gnu.org (Ludovic Courtès) writes: > >> I did some measurements with the attached program on chapters, which is >> a Xen VM with spinning disks underneath, similar to hydra.gnu.org. It >> has 600k entries in /gnu/store/.links. > > I just want to point ou

bug#24937: "deleting unused links" GC phase is too slow

2016-12-13 Thread Mark H Weaver
l...@gnu.org (Ludovic Courtès) writes: > I did some measurements with the attached program on chapters, which is > a Xen VM with spinning disks underneath, similar to hydra.gnu.org. It > has 600k entries in /gnu/store/.links. I just want to point out that 600k inodes use 150 megabytes of disk sp

bug#24937: "deleting unused links" GC phase is too slow

2016-12-12 Thread Mark H Weaver
Do as you wish. I don't have time to continue discussing this. Mark

bug#24937: "deleting unused links" GC phase is too slow

2016-12-12 Thread Ludovic Courtès
Mark H Weaver skribis: > l...@gnu.org (Ludovic Courtès) writes: > >> Mark H Weaver skribis: >> >>> I think we should sort the entire directory using merge sort backed to >>> disk files. If we load chunks of the directory, sort them and process >>> them individually, I expect that this will incr

bug#24937: "deleting unused links" GC phase is too slow

2016-12-11 Thread Mark H Weaver
l...@gnu.org (Ludovic Courtès) writes: > Mark H Weaver skribis: > >> I think we should sort the entire directory using merge sort backed to >> disk files. If we load chunks of the directory, sort them and process >> them individually, I expect that this will increase the amount of I/O >> require

bug#24937: "deleting unused links" GC phase is too slow

2016-12-11 Thread Ludovic Courtès
Mark H Weaver skribis: > l...@gnu.org (Ludovic Courtès) writes: > >> Here’s a proposed patch that follows your suggestion, Mark, but places >> an upper bound on the number of directory entries loaded in memory. >> >> On my laptop, which has ~500k entries in /gnu/store/.links, the result >> is som

bug#24937: "deleting unused links" GC phase is too slow

2016-12-11 Thread Mark H Weaver
l...@gnu.org (Ludovic Courtès) writes: > Here’s a proposed patch that follows your suggestion, Mark, but places > an upper bound on the number of directory entries loaded in memory. > > On my laptop, which has ~500k entries in /gnu/store/.links, the result > is something like this (notice the inod

bug#24937: "deleting unused links" GC phase is too slow

2016-12-11 Thread Ludovic Courtès
Hello! Here’s a proposed patch that follows your suggestion, Mark, but places an upper bound on the number of directory entries loaded in memory. On my laptop, which has ~500k entries in /gnu/store/.links, the result is something like this (notice the inode numbers in ‘lstat’ calls): --8<---

bug#24937: "deleting unused links" GC phase is too slow

2016-12-09 Thread Ludovic Courtès
l...@gnu.org (Ludovic Courtès) skribis: > ‘LocalStore::removeUnusedLinks’ traverses all the entries in > /gnu/store/.links and calls lstat(2) on each one of them and checks > ‘st_nlink’ to determine whether they can be deleted. > > There are two problems: lstat(2) can be slow on spinning disks as

bug#24937: "deleting unused links" GC phase is too slow

2016-11-13 Thread Ludovic Courtès
‘LocalStore::removeUnusedLinks’ traverses all the entries in /gnu/store/.links and calls lstat(2) on each one of them and checks ‘st_nlink’ to determine whether they can be deleted. There are two problems: lstat(2) can be slow on spinning disks as found on hydra.gnu.org, and the algorithm is propo