On 2/2/25 12:43, Bastian Blank wrote:
On Fri, Jan 31, 2025 at 01:46:57PM +0100, Thomas Goirand wrote:
I largely agree that we should reduce our use of sponsored hosting space in
general, and Google (non-free) cloud platform specifically.
To do this, Debian would need to run its own cloud platform as a
replacement. I've been advocating for it, and volunteered to maintain an
OpenStack cloud deployment for Debian own use.
Could you please estimate the required funds to do that? Aka hardware
and hosting paid full?
I can't do that without a rough estimate of our needs. How many VMs,
with how much RAM each, and how many vCPUs, how much storage space on
Ceph on SSD/NVMe, and how much object storage on HDD, etc. So far, this
has never been discussed.
I can setup clouds of any size, really. From 6 raspberry pies, for a
couple of thousands USD, to multi-terabytes of RAM compute nodes. From
nearly zero storage (only the system disks) available to petabytes of
NVMe, and exabytes of HDDs. What do we want/need exactly?
Does anyone in this list know how much we use at GCP?
Though I get it that maybe you just want rough numbers. So to give you a
clue, here's what I know about hardware price from my work. These are
really rough estimates from the top of my head, if we were to go forward
with such a project, we'd need a real quote from a hardware supplier
such as HPe, Dell or Supermicro. Note that I know better the pricing
models from HPe, which is what I used here, but Dell or Lenovo have
similar prices. I have no clue what the Supermicro offer is though.
So, a few numbers...
For just an entry level ticket with what I would say is the bare
minimum, if this may help:
* A 48 25 GBits/s ports switch (capable of running Cumulus linux) cost
around 7k USD. We'd need 2 per rack, plus a dumb (cheap) switch for IPMI.
* A compute nodes with 2x 112 cores Epyc with no ram and SSD cost around
13k USD. With 24x96 GB = 2.3 TB of RAM, plus 2x 250 SSD, plus 2x 2TB
SSD, we're approaching more like 20k USD. Let's say we start with 3 of them.
* A controller server (running the API) would be 4k USD (without RAM and
system) SSD. We would need 3 of them (for redundancy).
To sum up:
2x switch = 14k
3x compute = 60k
3x controller ~= 15k
Total: 90k USD
That's for what I would consider the bare minimum entry level
deployment, capable of running around 500 VMs (if we count 250 VM per
compute, and allow one to fail).
We also need to keep in mind that such deployment wouldn't run any type
of distributed storage. For Ceph storage, at my work, we use "ProLiant
RL300 Gen11". For a small Ceph cluster, I'd recommend 12 nodes minimum
(3 mon, plus 9 OSD nodes, allowing to loose 10% without production
impact). Each node is around 5 or 6k with 512 GB of RAM, so that's 72k
USD as a start, without the NVMe drives. For the drives, 8TB SSDs are
around 500 USD. Let's say we put 4 in each OSD server (each can host 10
drives), that's 9x4=36 drives, so 36x500 ~= 20k of storage, for a net
storage place of 96 TB. So, 96 TB of distributed Ceph storage would cost
around 90k USD (20k USD more if we want double the space).
Let's say we start with 3 switches, 3 controllers, 3 computes, and 12
ceph servers, and that all of them are a single U, that'd be 21U, so
half a rack to start with, with an initial investment of roughly 200k
USD as per above. Then we may add more compute nodes (20k USD each) and
storage nodes (6k USD each).
For a public cloud region, at infomaniak, we started with 99 servers:
- 45 nodes for Ceph (in 3 storage availability zones, 1PB of NVMe)
- 18 nodes for Swift
- 18 compute nodes
- 3 controllers and...
- ...3 billing servers (we call them "messaging nodes)
- 6 MariaDB/Galera nodes
- 6 network nodes
that's of course a way too much for the needs in Debian, but that was to
tell you how far we could go if we were to spend 1M USD... :)
At SPI we had around 50kUSD per year of
donations over the last years.
I really thought the grand total of donation for Debian was much larger.
For collocation, 50k would be more than enough for renting a full rack
with network connectivity + power (I explained above that for the
smallest deployment, that could be enough), but not enough if we want 3
racks (meaning more redundancy).
I can't imagine this is even remotely
enough to do that all in a sustainable manner.
It could, but we need to run numbers, know what we want (and how much),
plus hopefully have a reasonable consensus that this is how we want to
spend Debian money. *I* believe that being independent from any cloud
sponsor is important, but I do not know if other DDs agree.
Also, please list the people that would be capable to running an
OpenStack of sufficient quality for our use. You alone are far from
enough.
I already found volunteers for it. Michal Arbet for example said he
would be ok to help with OpenStack maintenance. The network admin at my
work said he'd volunteer for doing the Cumulus Linux setup with me, and
maintain it. I'm sure I would get help with Ceph storage maintenance
from Daniel (he co-maintains Ceph with me in Debian). So installation
and maintenance is *not* issue I'm worried about.
What I don't know is where to physically host the servers, and who would
do the hardware setup part. Just saying "I like this hosting company"
doesn't help: we'd need a collocation space, with ideally 2 DDs leaving
nearby, willing to do the hardware setup. Either that, or a hosting
company that would be ok to offer remote hands for free, because we're
Debian. But I don't know any that would do that. Even the first setup
(about 20 servers and 3 switches to assemble, rack, wire-up, and PXE
boot) would take maybe a week of work (for a single person). Until
*THIS* isn't solved, I don't think it's reasonable to even start
thinking about any kind of price/cost.
I hope the above helps understanding what kind of costs we're talking about.
Cheers,
Thomas Goirand (zigo)