Hi Arun,
On Fri, 17 Jan 2020 at 20:29, Arun Isaac <arunis...@systemreboot.net> wrote: > > 1. > > How to update the index. > > Give a look at the "pull" code and the ~/.cache/guix folder. > > We don't "update" the index. At every guix pull we create it > anew. Currently, generate-package-cache in gnu/packages.scm does > this. generate-package-cache is called by package-cache-file in > guix/channels.scm. package-cache-file is a channel profile hook listed > under %channel-profile-hooks. I would like to be able to search the packages in all the history of all the commits, and not only in only the packages for one specific commit. > Now, what I am unclear about is how to test my sqlite index building > code without actually pushing to master and running a guix pull. I will > go through the various tests in Guix and see if I can figure something > out, but any pointers would be much appreciated. To test "guix pull", simple "make as-derivation". Disclaim: can take some time :-) Then the issue is more to avoid to pollute your ~/.cache/guix and ~/.config/guix :-) 1. Update Guix with the result in /tmp/test guix pull -p /tmp/test --url=/path/to/guix/repo 2. Create your SQL index /tmp/test/bin/guix pull -p /tmp/trash Now your index should be created with all the packages currently in master. To have something reproducible (and faster), I suggest to add --commit= and always pull against the same commit. 3. Test the index /tmp/test/bin/guix search foo I mean something along these lines. ;-) > > 2. > > How to deal with regexp. > > It is more or less clear to me how to deal with using the trigram keys > > but I do not know with SQLite; I have not thought about yet. > > I think it is not possible to search using regular expressions in sqlite I think it is possible. I imagine something using multiple query. I will give a look at the Guile module. > I think we should remove regex support altogether. I don't think a good > search interface should expect the user to provide regexes for > search. Certainly, it will be a lot less useful if and when we have > xapian. However, just to keep backward compatibility, we can fall back > to brute force fold-packages search for regexes. As Ludo pointed out, we > can't remove the brute force code since we need to support cases when > the cache is not authoritative. I disagree. We should keep the regexp. Otherwise we cannot include under "guix search" or "guix package --search=" because arguments about backward compatibility. The end user interface (CLI) has to be exactly the same when using brute force or the index. And the results too. > About sqlite versus an inverted index using vhashes, I don't know if it > is possible to serialize a vhash onto disk. Even if that were possible, > we'll have to load the entire vhash based inverted index into memory for > every invocation of guix search, and that could hit > performance. Something like guile-gdbm could have helped, but that's > another story. And your first test was not fair. ;-) Because you compared when the hash table was already in memory. I mean to know the real performance, only timing can talk. :-) > I didn't know about sets.scm when I wrote my first proof of concept > inverted index script. That is why I reinvented the set using hash > tables. I don't know how hash tables are different from VHashes or which > is better. VHashes is a bit confused in my mind too. ;-) https://www.gnu.org/software/guile/manual/html_node/VHashes.html Cheers, simon