I'm sure there's a lots of pitfalls. A few of them in my mind right now:
* With a single node, you will completely lose the benefit of high
availability from Cassandra. Not only hardware failure will result
in downtime, routine maintenance (such as software upgrade) can also
result in downtime.
* RAID6 does provide redundancy in case of a disk failure. However,
RAID doesn't prevent bit rots, and many implementations (both
software and hardware) of RAID don't even attempt to detect it. You
are often at the mercy of the hard drive firmware's ability to
detect bit rots and return an URE (Unrecoverable Read Error) instead
of the rotten data. Based on my experience, even enterprise drives
don't always do a good job, and SSDs can also fail miserably on
this. A corrupted SSTable in a single node cluster could lead to
permanent data loss, because Cassandra doesn't have a replica of the
data on other nodes. Recovering the data from a RAID6 is
theoretically possible, but it almost certainly will cause some
downtime, and it's not going to be easy.
* Drives of the same model, from the same batch, installed on the same
server and used in the same RAID array tend to fail at roughly the
same time. If you aren't careful enough to mix and match the drives,
you may end up more than 2 drives failing at roughly the same time
in your RAID6 and lose your data.
* Having two very large nodes in a cluster, either within the same DC
with RF=2 or split into two DCs with RF=1 each, will somehow help to
address the above issues, but how long will the repairs take?
* Depending on the rate of write, you may run into I/O bottlenecks
because compactions can involve many very large SSTables, and this
will be made worse by the slow repairs. If the compaction can't keep
up with the rate of write, your Cassandra node is going to crash
with the "too many open files" error.
* For very large node, you may also run into memory size constraints.
See
https://cassandra.apache.org/doc/latest/operating/compression.html#operational-impact
On 08/04/2021 14:56, Lapo Luchini wrote:
Hi, one project I wrote is using Cassandra to back the huge amount of
data it needs (data is written only once and read very rarely, but
needs to be accessible for years, so the storage needs become huge in
time and I chose Cassandra mainly for its horizontal scalability
regarding disk size) and a client of mine needs to install that on his
hosts.
Problem is, while I usually use a cluster of 6 "smallish" nodes (which
can grow in time), he only has big ESX servers with huge disk space
(which is already RAID-6 redundant) but wouldn't have the possibility
to have 3+ nodes per DC.
This is out of my usual experience with Cassandra and, as far as I
read around, out of most use-cases found on the website or this
mailing list, so the question is:
does it make sense to use Cassandra with a big (let's talk 6TB today,
up to 20TB in a few years) single-node DataCenter, and another
single-node DataCenter (to act as disaster recovery)?
Thanks in advance for any suggestion or comment!