How long is traditional before I can bump a thread?
On Sun, Oct 15, 2023 at 2:21 PM Josh Marshall
<joshua.r.marshall.1...@gmail.com> wrote:
>
> So it sounds like my first steps are to re-implement the downloads
> using aria2c. This would affect the minimum base package, no? Can I
> get some buy-in from maintainers that such changes are acceptable?
>
> On Fri, Oct 13, 2023 at 2:06 PM James R. Haigh (+ML.GNU.Guix
> subaddress) <jrhaigh+ml.gnu.g...@runbox.com> wrote:
> >
> > Hi Josh,
> >
> > At Z-0400=2023-10-13Fri12:36:01, Josh Marshall sent:
> > > This is to parallelize connections which should never hurt downloading
> > > but can help. Mirroring would be parallelizing for providing packages,
> > > what I want to implement is to parallelize obtaining packages. Server
> > > side vs client side.
> >
> > Please, if you are going to do something like this, please use a
> > torrent architecture like BitTorrent or GNUnet – I suggest Aria2c as a very
> > good CLI download backend that can be daemonised and sent instructions over
> > a socket to add, pause, remove downloads, etc., and it supports magnet URLs
> > including the existing nontorrent servers (via ‘as’ parameters, iirc.).
> >
> > I actually implemented this in a local copy of APT Daemon many
> > years ago (circa 2011), but the change was not accepted upstream to
> > Launchpad (because I was not on bleeding-edge; I was too slow to keep-up
> > with the upstream development). My fork got forgotten about, because to
> > get the full benefit the server would have had to have added a BitTorrent
> > Info Hash (BTIH) to the metadata of each package, along with the MD5,
> > SHA-256, etc. that it already did (not a big ask, really). That said,
> > without the full benefit of having the metadata, it did provide immediate
> > benefit and I used it for many years, not upgrading my Ubuntu 11.04 Natty
> > Narwhal that I was using back then until I really had to.
> >
> > The immediate benefit that it provided was exactly as you
> > described: It allowed parallelisation of nontorrent downloads, be it from
> > the same server or from multiple mirrors. Iirc., I achieved this by simply
> > passing the download list to Aria2c in daemon mode, I think I also
> > converted all the HTTP URLs to ‘as’ parameters in magnet links, so that
> > multiple mirrors could be passed using multiple ‘as’ parameters in each
> > magnet link. Then I simply relied on Aria2c being amazing at parallelising
> > everything that I had given it! I then also implemented progress updates
> > such that APT Daemon could reflect where Aria2c was up to.
> >
> > The way I implemented this using Aria2c and magnet URLs meant that
> > if additional hashes were known, they could be used as well, and so if the
> > server metadata made the simple addition of adding BTIHs, it allows
> > swarming to occur, which in-turn would massively reduce load on the central
> > servers, and allow anyone who want to be a mirror to be a mirror simply by
> > seeding indefinitely. A default share ratio of 1.0 means that no user is a
> > burden on the network, unless they deliberately change that. Users can
> > donate to the running costs of the project simply by increasing their share
> > ratio, which adds another means of contribution that they may find easier
> > than the others.
> >
> > Anyone keen to keep old packages online can simply seed them
> > indefinitely, so this is also really great for archival purposes. Even if
> > the central project loses interest in the old packages and deletes them,
> > anyone else can keep them up. The hashes ensure that they have not been
> > tampered with.
> >
> > There is also a really cool benefit that occurs, or can occur, on a
> > LAN. An entire network of computers can all swarm locally with each other,
> > thus needing each package to only need downloading through the metered last
> > mile bottleneck from the WAN precisely once – providing that local
> > broadcasting is supported. I think this requires Avahi, and I seem to
> > remember that Aria2c supports this but I can't remember. I don't ever
> > remember getting this bit working but also I did not try hard because it
> > would have required the metadata that I didn't have until after download,
> > so even if I got it working it would not have been directly useful unless
> > the APT repositories that I was using would include the BTIHs.
> >
> > So yeah, loads of great benefits to this architecture, and I
> > highly-recommend it: convert all existing URLs to magnet links (can be done
> > client-side as I did; or server-side); optionally add any additional
> > mirrors as additional ‘as’ parameters (again client-side or server-side);
> > add ‘btih’ parameters to the magnet links (the BTIH must be included in the
> > server metadata to get the full benefit of the swarming, but conversion to
> > magnet link format can be done client-side or server-side); then simply
> > pass all this to a really good parallelising backend such as Aria2c; then
> > update any progress data and relay pause, resume, cancel, etc. to the
> > backend.
> >
> > One final note, as I am sure that there are a lot of GNUnet fans on
> > this list, is that I would try Aria2c first to see how well it can work,
> > and then try GNUnet or whatever else once you have a standard to benchmark
> > against. Both are Free Software, so no concern there. Aria2c is an
> > all-round download manager CLI that works with or without swarming, i.e. it
> > is just as good at HTTPS as it is BitTorrent, and can do both at the same
> > time. GNUnet has the advantage of working from SHA-256 iirc., which is
> > generally already included in the metadata of the repositories of various
> > distributions, but I think it lacks a lot of other features and stability
> > and ecosystem of alternative backends, compared to the BitTorrent network.
> >
> > Of course, there is no harm in including other hashes along with
> > BTIH, to allow people to experiment with alternative backends, while always
> > ensuring that what works works well. Another hash that may be useful to
> > include is the Tiger Tree Hash, which is structurally very similar to BTIH,
> > but stronger, iirc..
> >
> > The first thing that the Guix project can do to signal interest in
> > this architecture is to simply include the BTIH of each package in the
> > repository metadata. Be it in magnet URL form or not does not matter
> > because the client can later convert that as needed. The important thing
> > is an authoritative statement in metadata that this version of this package
> > has this BTIH. Once that metadata is available, the game is on to
> > implement swarming support, be it with Aria2c as a backend (as I recommend
> > at least starting with) or otherwise.
> >
> > I know that this architecture works well out of first-hand
> > experience with APT Daemon written in Python. The only failure I had with
> > it was lack of upstream support. So I consider it important to first
> > attain the upstream approval before really investing more time into this.
> > I seem to remember suggesting this to the Nix project many years ago and
> > didn't get anywhere, and now I don't have the energy to try to improve
> > upstream projects if they reject my ideas, so I'll be interested to see
> > whether you have any success with your attempt to do the same.
> >
> > Good luck! ;-)
> >
> > Kind regards,
> > James.
> > --
> > Wealth doesn't bring happiness, but poverty brings sadness.
> > Sent from Debian with Claws Mail, using email subaddressing as an
> > alternative to error-prone heuristical spam filtering.
> > Postal: James R. Haigh, Middle Farm, Vennington, nr. Westbury, nr.
> > Shrewsbury, Salop, SY5 9RG, Britain