So it sounds like my first steps are to re-implement the downloads using aria2c. This would affect the minimum base package, no? Can I get some buy-in from maintainers that such changes are acceptable?
On Fri, Oct 13, 2023 at 2:06 PM James R. Haigh (+ML.GNU.Guix subaddress) <jrhaigh+ml.gnu.g...@runbox.com> wrote: > > Hi Josh, > > At Z-0400=2023-10-13Fri12:36:01, Josh Marshall sent: > > This is to parallelize connections which should never hurt downloading but > > can help. Mirroring would be parallelizing for providing packages, what I > > want to implement is to parallelize obtaining packages. Server side vs > > client side. > > Please, if you are going to do something like this, please use a > torrent architecture like BitTorrent or GNUnet – I suggest Aria2c as a very > good CLI download backend that can be daemonised and sent instructions over a > socket to add, pause, remove downloads, etc., and it supports magnet URLs > including the existing nontorrent servers (via ‘as’ parameters, iirc.). > > I actually implemented this in a local copy of APT Daemon many years > ago (circa 2011), but the change was not accepted upstream to Launchpad > (because I was not on bleeding-edge; I was too slow to keep-up with the > upstream development). My fork got forgotten about, because to get the full > benefit the server would have had to have added a BitTorrent Info Hash (BTIH) > to the metadata of each package, along with the MD5, SHA-256, etc. that it > already did (not a big ask, really). That said, without the full benefit of > having the metadata, it did provide immediate benefit and I used it for many > years, not upgrading my Ubuntu 11.04 Natty Narwhal that I was using back then > until I really had to. > > The immediate benefit that it provided was exactly as you described: > It allowed parallelisation of nontorrent downloads, be it from the same > server or from multiple mirrors. Iirc., I achieved this by simply passing > the download list to Aria2c in daemon mode, I think I also converted all the > HTTP URLs to ‘as’ parameters in magnet links, so that multiple mirrors could > be passed using multiple ‘as’ parameters in each magnet link. Then I simply > relied on Aria2c being amazing at parallelising everything that I had given > it! I then also implemented progress updates such that APT Daemon could > reflect where Aria2c was up to. > > The way I implemented this using Aria2c and magnet URLs meant that if > additional hashes were known, they could be used as well, and so if the > server metadata made the simple addition of adding BTIHs, it allows swarming > to occur, which in-turn would massively reduce load on the central servers, > and allow anyone who want to be a mirror to be a mirror simply by seeding > indefinitely. A default share ratio of 1.0 means that no user is a burden on > the network, unless they deliberately change that. Users can donate to the > running costs of the project simply by increasing their share ratio, which > adds another means of contribution that they may find easier than the others. > > Anyone keen to keep old packages online can simply seed them > indefinitely, so this is also really great for archival purposes. Even if > the central project loses interest in the old packages and deletes them, > anyone else can keep them up. The hashes ensure that they have not been > tampered with. > > There is also a really cool benefit that occurs, or can occur, on a > LAN. An entire network of computers can all swarm locally with each other, > thus needing each package to only need downloading through the metered last > mile bottleneck from the WAN precisely once – providing that local > broadcasting is supported. I think this requires Avahi, and I seem to > remember that Aria2c supports this but I can't remember. I don't ever > remember getting this bit working but also I did not try hard because it > would have required the metadata that I didn't have until after download, so > even if I got it working it would not have been directly useful unless the > APT repositories that I was using would include the BTIHs. > > So yeah, loads of great benefits to this architecture, and I > highly-recommend it: convert all existing URLs to magnet links (can be done > client-side as I did; or server-side); optionally add any additional mirrors > as additional ‘as’ parameters (again client-side or server-side); add ‘btih’ > parameters to the magnet links (the BTIH must be included in the server > metadata to get the full benefit of the swarming, but conversion to magnet > link format can be done client-side or server-side); then simply pass all > this to a really good parallelising backend such as Aria2c; then update any > progress data and relay pause, resume, cancel, etc. to the backend. > > One final note, as I am sure that there are a lot of GNUnet fans on > this list, is that I would try Aria2c first to see how well it can work, and > then try GNUnet or whatever else once you have a standard to benchmark > against. Both are Free Software, so no concern there. Aria2c is an > all-round download manager CLI that works with or without swarming, i.e. it > is just as good at HTTPS as it is BitTorrent, and can do both at the same > time. GNUnet has the advantage of working from SHA-256 iirc., which is > generally already included in the metadata of the repositories of various > distributions, but I think it lacks a lot of other features and stability and > ecosystem of alternative backends, compared to the BitTorrent network. > > Of course, there is no harm in including other hashes along with > BTIH, to allow people to experiment with alternative backends, while always > ensuring that what works works well. Another hash that may be useful to > include is the Tiger Tree Hash, which is structurally very similar to BTIH, > but stronger, iirc.. > > The first thing that the Guix project can do to signal interest in > this architecture is to simply include the BTIH of each package in the > repository metadata. Be it in magnet URL form or not does not matter because > the client can later convert that as needed. The important thing is an > authoritative statement in metadata that this version of this package has > this BTIH. Once that metadata is available, the game is on to implement > swarming support, be it with Aria2c as a backend (as I recommend at least > starting with) or otherwise. > > I know that this architecture works well out of first-hand experience > with APT Daemon written in Python. The only failure I had with it was lack > of upstream support. So I consider it important to first attain the upstream > approval before really investing more time into this. I seem to remember > suggesting this to the Nix project many years ago and didn't get anywhere, > and now I don't have the energy to try to improve upstream projects if they > reject my ideas, so I'll be interested to see whether you have any success > with your attempt to do the same. > > Good luck! ;-) > > Kind regards, > James. > -- > Wealth doesn't bring happiness, but poverty brings sadness. > Sent from Debian with Claws Mail, using email subaddressing as an alternative > to error-prone heuristical spam filtering. > Postal: James R. Haigh, Middle Farm, Vennington, nr. Westbury, nr. > Shrewsbury, Salop, SY5 9RG, Britain