I am also curious on this question. Say your use case is to store
10PBytes of data in a new server room / data-center with new equipment,
what makes the most sense? If your database is primarily write with
little read, I think you'd want to maximize disk space per rack space.
So you may opt for a 2u server with 24 3.5" disks at 16TBytes each for a
node with 384TBytes of disk - so ~27 servers for 10PBytes.
Cassandra doesn't seem to be the good choice for that configuration; the
rule of thumb that I'm hearing is ~2Tbytes per node, in which case we'd
need over 5000 servers. This seems really unreasonable.
-Joe
On 4/8/2021 9:56 AM, Lapo Luchini wrote:
Hi, one project I wrote is using Cassandra to back the huge amount of
data it needs (data is written only once and read very rarely, but
needs to be accessible for years, so the storage needs become huge in
time and I chose Cassandra mainly for its horizontal scalability
regarding disk size) and a client of mine needs to install that on his
hosts.
Problem is, while I usually use a cluster of 6 "smallish" nodes (which
can grow in time), he only has big ESX servers with huge disk space
(which is already RAID-6 redundant) but wouldn't have the possibility
to have 3+ nodes per DC.
This is out of my usual experience with Cassandra and, as far as I
read around, out of most use-cases found on the website or this
mailing list, so the question is:
does it make sense to use Cassandra with a big (let's talk 6TB today,
up to 20TB in a few years) single-node DataCenter, and another
single-node DataCenter (to act as disaster recovery)?
Thanks in advance for any suggestion or comment!