Denis 'GNUtoo' Carikli <gnu...@cyberdimension.org> writes:
> [[PGP Signed Part:Undecided]] > Hi, > > Is there any policies or past decisions of the Guix project on > packaging big generated data files? > > I've added packages for software like kiwix-tools and navit that both > work offline but that also need data files to be useful. > > Navit is a (car) navigation software that need maps. The maps can be > generated from OpenStreetMap dumps with a tool available in Navit > source code (maptool)[1] which is not packaged yet. Binary map files can > also be downloaded directly from various sources. > > Right now the biggest file possible for such maps is about 47 GiB > (for the whole planet). > > As for kiwix-tools, it can serve offline versions of websites like > Wikipedia, and there too it needs files to work. The biggest file seems > to be the complete version of English Wikipedia with scaled down > pictures[2] and it takes about 89 GiB. I didn't look yet how these files > were generated but I guess that they somehow can be generated from > Wikipedia dumps. > > Packaging the binary files (without generating them) can be useful as > it simplifies a lot the maintenance as one can just update the package > version and checksum to update these. It also enables to keep the > information (download URL, checksum, license) in one place and it > enables easy reuse by Guix services and/or configuration files. > > If these files were generated in packages, it would also enable to > tweak the data, for instance by adding height data in navit maps. As > for kiwix compatible files, it would probably enable to decide when to > make the snapshots or enable to package additional wikis > (like the Libreplanet Wiki) or websites. > > The issue here is probably the size of the generated files: they are > huge, so if they are packaged, they will most likely take significant > resources in the Guix infrastructure. > > So what would be the way to go here? Would Guix accept patches to add > packages for these files in Guix proper? > > If so, does it needs to be done like with the ZFS (kernel module) > package where "#:substitutable? #f" is used to avoid redistributing > package builds? Or are other ways better for such use cases? > > Note that so far I've only packaged locally only kiwix compatible files > for various wikis by just downloading already prepared files, so I > didn't look yet into navit maps or into generating all these files, so > I might miss some details about generating them. > > References: > ----------- > [1]https://navit.readthedocs.io/en/latest/maps.html#processing-osm-maps-yourself > [2]https://mirror.download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2022-05.zim > > Denis. > > [[End of PGP Signed Part]] Could ZIM files be downloaded over bittorrent as fixed output derivations? They can be pretty huge. Also if the system started seeding them as well, that would be pretty cool.