Pierre Neidhardt <m...@ambrevar.xyz> writes: > By the way, what about using Xapian in Guix?
I looked up xapian's features at https://xapian.org/features and it is quite impressive. I was introduced to xapian through notmuch. notmuch does not utilize xapian to the fullest and I therefore ended up underestimating its value. Of particular importance might be the following. - Relevance feedback - given one or more documents, Xapian can suggest the most relevant index terms to expand a query, suggest related documents, categorise documents, etc. - Phrase and proximity searching - users can search for words occurring in an exact phrase or within a specified number of words, either in a specified order, or in any order. - Supports stemming of search terms (e.g. a search for "football" would match documents which mention "footballs" or "footballer") I think these features would really help in Pierre's work trying to improve search and discoverability on Guix. If we are planning to have a "Software Center" like interface at some point in the future, xapian's search could come in handy. Not directly related to Guix, but I also wonder if info manuals would be a lot more useful if they had good full text search using xapian. For the time being, since we don't have xapian bindings, I think we should settle for sqlite's full text search capabilities. https://www.sqlite.org/fts5.html I have attached a short proof of concept script for an sqlite based search. Speedup is around 200x, and populating the database only takes around 2.5 seconds. Here is a sample run. Sqlite database populated in 2.5516340732574463 seconds Brute force search took 0.11850595474243164 seconds Sqlite search took 5.459785461425781e-4 seconds
(use-modules (gnu packages) (guix packages) (ice-9 match) (sqlite3) (srfi srfi-26)) (define db (sqlite-open "/tmp/index.sqlite")) (define schema "CREATE VIRTUAL TABLE packages USING fts5(name, description)") (define (build-sqlite-database) (sqlite-exec db schema) (sqlite-exec db "BEGIN") (fold-packages (lambda (package _) (let ((statement (sqlite-prepare db "INSERT INTO packages(name, description) VALUES(:name, :description)"))) (sqlite-bind-arguments statement #:name (package-name package) #:description (package-description package)) (sqlite-fold cons '() statement) (sqlite-finalize statement))) #f) (sqlite-exec db "COMMIT;")) (define (sqlite-retrieve query) (let ((statement (sqlite-prepare db "SELECT name FROM packages WHERE description MATCH :query"))) (sqlite-bind-arguments statement #:query query) (let ((result (sqlite-fold (lambda (v result) (match v (#(name) (cons name result)))) '() statement))) (sqlite-finalize statement) result))) (define (brute-force-retrieve query) "Return names of all packages whose descriptions contain the string QUERY. Search brute force by folding over all packages." (fold-packages (lambda (package result) (if (string-contains (package-description package) query) (cons package result) result)) '())) (define (time-us thunk) (define pair->sec (match-lambda ((sec . usec) (+ sec (/ usec 1e6))))) (let ((start (gettimeofday))) (thunk) (let ((stop (gettimeofday))) (- (pair->sec stop) (pair->sec start))))) (format #t "Sqlite database populated in ~a seconds Brute force search took ~a seconds Sqlite search took ~a seconds " (time-us build-sqlite-database) (time-us (cut brute-force-retrieve "strategy")) (time-us (cut sqlite-retrieve "strategy"))) (sqlite-close db)
signature.asc
Description: PGP signature