On 2/2/25 12:43, Bastian Blank wrote:
On Fri, Jan 31, 2025 at 01:46:57PM +0100, Thomas Goirand wrote:
I largely agree that we should reduce our use of sponsored hosting space in
general, and Google (non-free) cloud platform specifically.
To do this, Debian would need to run its own cloud platform as a
replacement. I've been advocating for it, and volunteered to maintain an
OpenStack cloud deployment for Debian own use.

Could you please estimate the required funds to do that?  Aka hardware
and hosting paid full?

I can't do that without a rough estimate of our needs. How many VMs, with how much RAM each, and how many vCPUs, how much storage space on Ceph on SSD/NVMe, and how much object storage on HDD, etc. So far, this has never been discussed.

I can setup clouds of any size, really. From 6 raspberry pies, for a couple of thousands USD, to multi-terabytes of RAM compute nodes. From nearly zero storage (only the system disks) available to petabytes of NVMe, and exabytes of HDDs. What do we want/need exactly?

Does anyone in this list know how much we use at GCP?

Though I get it that maybe you just want rough numbers. So to give you a clue, here's what I know about hardware price from my work. These are really rough estimates from the top of my head, if we were to go forward with such a project, we'd need a real quote from a hardware supplier such as HPe, Dell or Supermicro. Note that I know better the pricing models from HPe, which is what I used here, but Dell or Lenovo have similar prices. I have no clue what the Supermicro offer is though.

So, a few numbers...

For just an entry level ticket with what I would say is the bare minimum, if this may help: * A 48 25 GBits/s ports switch (capable of running Cumulus linux) cost around 7k USD. We'd need 2 per rack, plus a dumb (cheap) switch for IPMI. * A compute nodes with 2x 112 cores Epyc with no ram and SSD cost around 13k USD. With 24x96 GB = 2.3 TB of RAM, plus 2x 250 SSD, plus 2x 2TB SSD, we're approaching more like 20k USD. Let's say we start with 3 of them. * A controller server (running the API) would be 4k USD (without RAM and system) SSD. We would need 3 of them (for redundancy).

To sum up:
2x switch = 14k
3x compute = 60k
3x controller ~= 15k
Total: 90k USD

That's for what I would consider the bare minimum entry level deployment, capable of running around 500 VMs (if we count 250 VM per compute, and allow one to fail).

We also need to keep in mind that such deployment wouldn't run any type of distributed storage. For Ceph storage, at my work, we use "ProLiant RL300 Gen11". For a small Ceph cluster, I'd recommend 12 nodes minimum (3 mon, plus 9 OSD nodes, allowing to loose 10% without production impact). Each node is around 5 or 6k with 512 GB of RAM, so that's 72k USD as a start, without the NVMe drives. For the drives, 8TB SSDs are around 500 USD. Let's say we put 4 in each OSD server (each can host 10 drives), that's 9x4=36 drives, so 36x500 ~= 20k of storage, for a net storage place of 96 TB. So, 96 TB of distributed Ceph storage would cost around 90k USD (20k USD more if we want double the space).

Let's say we start with 3 switches, 3 controllers, 3 computes, and 12 ceph servers, and that all of them are a single U, that'd be 21U, so half a rack to start with, with an initial investment of roughly 200k USD as per above. Then we may add more compute nodes (20k USD each) and storage nodes (6k USD each).

For a public cloud region, at infomaniak, we started with 99 servers:
- 45 nodes for Ceph (in 3 storage availability zones, 1PB of NVMe)
- 18 nodes for Swift
- 18 compute nodes
- 3 controllers and...
- ...3 billing servers (we call them "messaging nodes)
- 6 MariaDB/Galera nodes
- 6 network nodes
that's of course a way too much for the needs in Debian, but that was to tell you how far we could go if we were to spend 1M USD... :)

At SPI we had around 50kUSD per year of
donations over the last years.

I really thought the grand total of donation for Debian was much larger. For collocation, 50k would be more than enough for renting a full rack with network connectivity + power (I explained above that for the smallest deployment, that could be enough), but not enough if we want 3 racks (meaning more redundancy).

I can't imagine this is even remotely
enough to do that all in a sustainable manner.

It could, but we need to run numbers, know what we want (and how much), plus hopefully have a reasonable consensus that this is how we want to spend Debian money. *I* believe that being independent from any cloud sponsor is important, but I do not know if other DDs agree.

Also, please list the people that would be capable to running an
OpenStack of sufficient quality for our use.  You alone are far from
enough.

I already found volunteers for it. Michal Arbet for example said he would be ok to help with OpenStack maintenance. The network admin at my work said he'd volunteer for doing the Cumulus Linux setup with me, and maintain it. I'm sure I would get help with Ceph storage maintenance from Daniel (he co-maintains Ceph with me in Debian). So installation and maintenance is *not* issue I'm worried about.

What I don't know is where to physically host the servers, and who would do the hardware setup part. Just saying "I like this hosting company" doesn't help: we'd need a collocation space, with ideally 2 DDs leaving nearby, willing to do the hardware setup. Either that, or a hosting company that would be ok to offer remote hands for free, because we're Debian. But I don't know any that would do that. Even the first setup (about 20 servers and 3 switches to assemble, rack, wire-up, and PXE boot) would take maybe a week of work (for a single person). Until *THIS* isn't solved, I don't think it's reasonable to even start thinking about any kind of price/cost.

I hope the above helps understanding what kind of costs we're talking about.

Cheers,

Thomas Goirand (zigo)

Reply via email to