Maxim Cournoyer <maxim.courno...@gmail.com> skribis: > I also had the idea of making it a package... this way only the people > who opt to install the database locally would incur the cost (in > bandwidth). > > Perhaps a question for Vagrant: talking about size, is this SQLite > database file comparable or smaller in size to the apt-file database > that needs to be downloaded? With the Debian software catalog being > about 30% bigger, I'd expect a similarly bigger file size. > > If Debian is doing better in terms of database file size, we could look > at how they're doing it.
As a back-of-the-envelope estimate, here’s the amount of text that needs to be available in the database: --8<---------------cut here---------------start------------->8--- ludo@berlin ~/src$ sqlite3 -csv /tmp/db 'select name,version from packages; select name from directories;select name from files;'|wc -c 197689978 ludo@berlin ~/src$ guile -c '(pk (/ 197689978 (expt 2. 20)))' ;;; (188.5318546295166) ludo@berlin ~/src$ du -h /tmp/db 389M /tmp/db --8<---------------cut here---------------end--------------->8--- So roughly, SQLite with this particular schema ends up taking twice as much space as the lower bound. We can do a bit better (I’m not an expert, so I’m just trying things naively) by dropping the index and cleaning up the database: --8<---------------cut here---------------start------------->8--- ludo@berlin ~/src$ cp /tmp/db{,.without-index} ludo@berlin ~/src$ sqlite3 /tmp/db.without-index SQLite version 3.32.3 2020-06-18 14:00:33 Enter ".help" for usage hints. sqlite> drop index IndexFiles; sqlite> .quit ludo@berlin ~/src$ du -h /tmp/db.without-index 389M /tmp/db.without-index ludo@berlin ~/src$ sqlite3 /tmp/db.without-index SQLite version 3.32.3 2020-06-18 14:00:33 Enter ".help" for usage hints. sqlite> vacuum; sqlite> .quit ludo@berlin ~/src$ du -h /tmp/db.without-index 290M /tmp/db.without-index --8<---------------cut here---------------end--------------->8--- With compression: --8<---------------cut here---------------start------------->8--- ludo@berlin ~/src$ zstd -19 < /tmp/db.without-index > /tmp/db.without-index.zst ludo@berlin ~/src$ du -h /tmp/db.without-index.zst 37M /tmp/db.without-index.zst --8<---------------cut here---------------end--------------->8--- (Down from 61MB.) For comparison, this is smaller than guile, perl, gtk+, and roughly the same as glibc:out. For the record, with compression, the lower bound is about 12 MiB: --8<---------------cut here---------------start------------->8--- ludo@berlin ~/src$ sqlite3 -csv /tmp/db 'select name,version from packages; select name from directories;select name from files;'|zstd -19|wc -c 12128674 ludo@berlin ~/src$ guile -c '(pk (/ 12128674 (expt 2. 20)))' ;;; (11.566804885864258) --8<---------------cut here---------------end--------------->8--- All this to say that we could distribute the database in a form that gets closer to the optimal size, at the expense of extra processing on the client side upon reception to put it into shape (creating an index, etc.). Ludo’.