Hi, On Mon, 05 Oct 2020 at 20:53, Pierre Neidhardt <m...@ambrevar.xyz> wrote:
> - Textual database: slow and not lighter than SQLite. Not worth it I believe. Maybe I am out-of-scope, but re-reading *all* the discussion about “fileserch”, is it possible to really do better than “locate”? As Ricardo mentioned. --8<---------------cut here---------------start------------->8--- echo 3 > /proc/sys/vm/drop_caches time updatedb --output=/tmp/store.db --database-root=/gnu/store/ real 0m19.903s user 0m1.549s sys 0m4.500s du -sh /gnu/store /tmp/store.db 30G /gnu/store 56M /tmp/store.db guix gc -F XXG echo 3 > /proc/sys/vm/drop_caches time updatedb --output=/tmp/store.db --database-root=/gnu/store/ real 0m10.105s user 0m0.865s sys 0m2.020s du -sh /gnu/store /tmp/store.db 28G /gnu/store 52M /tmp/store.db --8<---------------cut here---------------end--------------->8--- And then “locate” support regexp and regex and it is fast enough. --8<---------------cut here---------------start------------->8--- echo 3 > /proc/sys/vm/drop_caches time locate -d /tmp/store.db --regex "emacs-ma[a-z0-9\.\-]+\/[^.]+.el$" | tail -n5 /gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/share/emacs/site-lisp/magit-transient.el /gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/share/emacs/site-lisp/magit-utils.el /gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/share/emacs/site-lisp/magit-wip.el /gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/share/emacs/site-lisp/magit-worktree.el /gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/share/emacs/site-lisp/magit.el real 0m3.601s user 0m3.528s sys 0m0.061s --8<---------------cut here---------------end--------------->8--- The only point is that regexp is always cumbersome for me. Well: «Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.» :-) [1] [1] https://en.wikiquote.org/wiki/Jamie_Zawinski > - Include synopsis and descriptions. Maybe we should include all fields > that are searched by `guix search`. This incurs a cost on the > database size but it would fix the `guix search` speed issue. Size > increases by some 10 MiB. >From my point of view, yes. Somehow “filesearch” is a subpart of “search”. So it should be the machinery. > I say we go with SQLite full-text search for now with all package > details. Switching to without full-text search is just a matter of a > minor adjustment, which we can decide later when merging the final > patch. Same if we decide not to include the description, synopsis, etc. [...] > - Populate the database on demand, either after a `guix build` or from a > `guix filesearch...`. This is important so that `guix filesearch` > works on packages built locally. If `guix build`, I need help to know > where to plug it in. [...] > - Sync the databases from the substitute server to the client when > running `guix filesearch`. For this I suggest we send the compressed > database corresponding to a guix generation over the network (around > 10 MiB). Not sure sending just the delta is worth it. >From my point of view, how to transfer the database from substitutes to users and how to locally update (custom channels or custom load path) are not easy. Maybe the core issues. For example, I just did “guix pull” and “–list-generation” says from f6dfe42 (Sept. 15) to 4ec2190 (Oct. 10):: 39.9 MB will be download more the tiny bits before “Computing Guix derivation”. Say 50MB max. Well, the “locate” database for my “/gnu/store” (~30GB) is already to ~50MB, and ~20MB when compressed with gzip. And Pierre said: The database will all package descriptions and synopsis is 46 MiB and compresses down to 11 MiB in zstd. which is better but still something. Well, it is not affordable to fetch the database with “guix pull”, IMHO. Therefore, the database would be fetched at the first “guix search” (assuming point above). But now, how “search” could know what is custom build and what is not? Somehow, “search” should scan all the store to be able to update the database. And what happens each time I am doing a custom build then “filesearch”. The database should be updated, right? Well, it seems almost unusable. The model “updatedb/locate” seems better. The user updates “manually” if required and then location is fast. Most of the cases are searching files in packages that are not my custom packages. IMHO. To me, each time I am using “filesearch”: - first time: fetch the database corresponding the Guix commit and then update it with my local store - otherwise: use this database - optionally update the database if the user wants to include new custom items. We could imagine a hook or option to “guix pull” specifying to also fetch the database and update it at pull time instead of “search” time. Personally, I prefer longer “guix pull” because it is already a bit long and then fast “search” than half/half (not so long pull and longer search). WDYT? > - Find a way to garbage-collect the database(s). My intuition is that > we should have 1 database per Guix checkout and when we `guix gc` a > Guix checkout we collect the corresponding database. Well, the exact same strategy as ~/.config/guix/current/lib/guix/package.cache can be used. BTW, thanks Pierre for improving the Guix discoverability. :-) Cheers, simon