Hello everyone, you may have noticed that we had a bit of a downtime with Salsa recently. What follows is a short summary on how it came to be, why it took so long, and a bit about the future of Salsa.
But before we start on that, we want to thank Bastian Blank for his long work on Salsa and it's ansible infrastructure, the service would be quite a bit less maintainable without the setup he helped create. Next, a bit of needed background about services running on DSA (Debian System Administrators) maintained machines (anything in .debian.org): You may (or not) know that we are running on volunteer maintained machines (same as we are volunteers maintaining Salsa). DSA has put together a set of rules they follow on how they run our machines and what they expect from services. They are offering a good bunch of services (say, databases if you need one, webservers, ...) and they very much prefer software either installed from Debian (stable or backports) or software the services admin(s) install / provide themself. Those requirements have led to the Salsa service *not* using the upstream provided package (their "Omnibus" installation method), as that package is one *huge* beast, bundling everything needed, configuring everything centrally, outside of the usual known ways. Pretty obviously this goes against the basic rules from DSA. Instead, we are using the "install from source" variant, compiling stuff on our own, using ansible to help do that in a somewhat reliable way. You can find the repository with our ansible code at https://salsa.debian.org/salsa/salsa-ansible. Having been running Salsa for a while, Salsa Admins found various points in the setup, that can be improved, both for easier maintenance of the service, but also for the user experience - as the setup as it was (and still is) does have a few deficiences. A proposal on a possible changed setup has been written and circulated within Salsa Admin, DPL and DSA. Short summary is that the discussion around it took quite a long time and did not get to a good/useful conclusion, nor an implementation of any improvements. Due to that, Salsa has been in a kind of low-maintenance mode for the installed parts, which led to Salsa being behind upstream versions. Recently Gitlab published a critical security fix which forced us into action - we had to disable the service and could not open it up without upgrading it to a recent release. To be able to upgrade it, it needed an upgrade of the underlying machine too, from buster to bullseye. Thankfully DSA acted quickly on our request to upgrade the machine to bullseye, as such unblocking the upgrade path on our side to install more recent versions of gitlab. And then adjust the setup, configs and local builds to work again. This took a fair bit of work and some more help from DSA, but the majority of the downtime was actually spend in something we could only wait for ourself: Database migrations. Gitlab has changed various parts of the database with their releases and include a migration way to upgrade your database. Usually this can run in parallel to the normal operation of the services and as such it is optimized to not interrupt services. But in our situation we had to wait for the migrations to finish, before we finally upgraded to a released version, that no longer included the security hole that started this upgrade round. A big thank you has to go to Alexander Wirt, who has invested a huge amount of work and energy dealing with this upgrade, as well as we have to thank those various DSA members who helped with upgrading the system and adjustments needed later on - Tollef Fog Heen, Aurelien Jarno and Adam D. Barratt. With all the above, whats our current status? Simple: we are, again, on the latest released Gitlab version, and while we had a few reports of errors, they appear fixed now, and Salsa is back in operation. We still have a few points, that we want to (again) discuss with DSA and see how the setup can be adjusted, as some of the identified trouble points are still there. But there is less pressure behind this now, as we currently are able to closely follow upstream again. Before we get to the final point, some statistics about Salsa: Salsa currently hosts 58125 projects for 10930 users over 665 groups. It has seen 15527 Forks, 36650 merge requests, 302133 notes. Salsa knows of 5789 SSH keys and users created 9425 issues. A total of 342812 pipelines has been run, of which 226575 have been successful, 101198 failed, the rest got either cancelled or skipped. Salsa is running inside a virtual machine with 8 CPU cores, 32G of available RAM and uses about 1.6TB of space for the git repositories. Gitlabs background job system "Sidekiq" claims it has processed 68917652 background jobs, of which it declared only 84587 as failed. Want to help? If you are a Debian Developer[1] and interested to help us maintain the Salsa service, including possibly digging into the "bits below" directly on the machine to make it better for the users, better to maintain, and in general just keep one huge git forge running, please feel free to mail us at salsa-ad...@debian.org. We also hang out in #salsa on irc.oftc.net, though that is mainly one of our public support channels. -- For the salsa admins: Joerg [1] Sorry for that requirement, but with Salsa Admin being a delegated role, volunteers have to be members. Additionally, Salsa hosts a huge bunch of Debian repos, some of them not available to the public, but Salsa admin can see them, so we require admins to be DDs.
signature.asc
Description: PGP signature