Hello Mark, Mark H Weaver <m...@netris.org> skribis:
> l...@gnu.org (Ludovic Courtès) writes: > >> I did some measurements with the attached program on chapters, which is >> a Xen VM with spinning disks underneath, similar to hydra.gnu.org. It >> has 600k entries in /gnu/store/.links. > > I just want to point out that 600k inodes use 150 megabytes of disk > space on ext4, which is small enough to fit in the cache, so the disk > I/O will not be multiplied for such a small test case. Right. That’s the only spinning-disk machine I could access without problem. :-/ Ricardo, Roel: would you be able to run that links-traversal.c from <https://debbugs.gnu.org/cgi/bugreport.cgi?filename=links-traversal.c;bug=24937;msg=25;att=1> on a machine with a big store, as described at <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24937#25>? >> Semi-interleaved is ~12% slower here (not sure how reproducible that is >> though). > > This directory you're testing on is more than an order of magnitude > smaller than Hydra's when it's full. Unlike in your test above, all of > the inodes in Hydra's store won't fit in the cache. Good point. I’m trying my best to get performance figures, there’s no doubt we could do better! > In my opinion, the reason Hydra performs so poorly is because efficiency > and scalability are apparently very low priorities in the design of the > software running on it. Unfortunately, I feel that my advice in this > area is discarded more often than not. Well, as you know, I’m currently traveling, yet I take the time to answer your email at night; I think this should suggest that far from discarding your advice, I very much value it. I’m a maintainer though, so I’m trying to understand the problem better. It’s not just about finding the “optimal” solution, but also about finding a tradeoff between the benefits and the maintainability costs. >> sort.c in Coreutils is very big, and we surely don’t want to duplicate >> all that. Yet, I’d rather not shell out to ‘sort’. > > The "shell" would not be involved here at all, just the "sort" program. > I guess you dislike launching external processes? Can you explain why? I find that passing strings around among programs is inelegant (subjective), but I don’t think you’re really looking to argue about that, are you? :-) It remains that, if invoking ‘sort’ appears to be preferable *both* from performance and maintenance viewpoints, then it’s a good choice. That may be the case, but again, I prefer to have figures to back that. >> Do you know how many entries are in .links on hydra.gnu.org? > > "df -i /gnu" indicates that it currently has about 5.5M inodes, but > that's with only 29% of the disk in use. A few days ago, when the disk > was full, assuming that the average file size is the same, it may have > had closer to 5.5M / 0.29 ~= 19M inodes, OK, good to know. Thanks! Ludo’.