Clint Byrum <spam...@debian.org> writes: > Perhaps you missed the blog post [1] details?
> "About ten months ago, we realized that the next installation of Debian > was upcoming, and after upgrading about 20,000 machines since Debian 6 > (aka Squeeze) was released, we got pretty tired." > Even if the script is _PERFECT_ and handles all of the changes in > wheezy, just scheduling downtime and doing basic sanity checks on 20,000 > machines would require an incredible effort. If you started on release > day, and finished 2-3 machines per hour without taking any weekend days > off, you would just barely finish in time for oldstable to reach EOL. I > understand that they won't be done in a linear fashion, and some will > truly be a 5 minute upgrade/reboot, but no matter how you swing it you > are talking about a very expensive change. A few comments here from an enterprise administration perspective: First, if you have 20,000 machines, it's highly unlikely that each system will be a special snowflake. In that environment, you're instead talking about large swaths of systems that are effectively identical. You therefore don't have to repeat your sanity checking on each individual system, just on representives of the class, while using your configuration management system to ensure that all the systems in a class are identical. And in many cases you won't have to arrange downtime at all (because the systems are part of redundant pools). Second, with 20,000 machines, there is no way that I would upgrade the systems. Debian's upgrade support is very important for individual systems, personal desktops, and smaller-scale environments, but even when you're at the point of several dozen systems, I would stop doing upgrades. At Stanford, we have a general policy that we rebuild systems from FAI for new Debian releases. All local data is kept isolated from the operating system (or, ideally, not even on that system, which is the most common case -- data is on separate database servers or on the network file system) so that you can just wipe the disk, build a new system on the current stable, and put the data back on (after performing whatever related upgrade process you need to perform). There's up-front development required for your new service model for the new operating system release, which you validate outside of production, and then the production rollout is mechanical system rebuilds (which usually take under 10 minutes with FAI and are parallelizable). My personal opinion is that if someone is scripting an upgrade to 20,000 systems and running it on those systems one-by-one, they're doing things at the wrong scale and with the wrong tools for that sort of environment. -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/8738pu6euy....@windlord.stanford.edu