I'm sure there's a lots of pitfalls. A few of them in my mind right now:

 * With a single node, you will completely lose the benefit of high
   availability from Cassandra. Not only hardware failure will result
   in downtime, routine maintenance (such as software upgrade) can also
   result in downtime.
 * RAID6 does provide redundancy in case of a disk failure. However,
   RAID doesn't prevent bit rots, and many implementations (both
   software and hardware) of RAID don't even attempt to detect it. You
   are often at the mercy of the hard drive firmware's ability to
   detect bit rots and return an URE (Unrecoverable Read Error) instead
   of the rotten data. Based on my experience, even enterprise drives
   don't always do a good job, and SSDs can also fail miserably on
   this. A corrupted SSTable in a single node cluster could lead to
   permanent data loss, because Cassandra doesn't have a replica of the
   data on other nodes. Recovering the data from a RAID6 is
   theoretically possible, but it almost certainly will cause some
   downtime, and it's not going to be easy.
 * Drives of the same model, from the same batch, installed on the same
   server and used in the same RAID array tend to fail at roughly the
   same time. If you aren't careful enough to mix and match the drives,
   you may end up more than 2 drives failing at roughly the same time
   in your RAID6 and lose your data.
 * Having two very large nodes in a cluster, either within the same DC
   with RF=2 or split into two DCs with RF=1 each, will somehow help to
   address the above issues, but how long will the repairs take?
 * Depending on the rate of write, you may run into I/O bottlenecks
   because compactions can involve many very large SSTables, and this
   will be made worse by the slow repairs. If the compaction can't keep
   up with the rate of write, your Cassandra node is going to crash
   with the "too many open files" error.
 * For very large node, you may also run into memory size constraints.
   See
   
https://cassandra.apache.org/doc/latest/operating/compression.html#operational-impact


On 08/04/2021 14:56, Lapo Luchini wrote:
Hi, one project I wrote is using Cassandra to back the huge amount of data it needs (data is written only once and read very rarely, but needs to be accessible for years, so the storage needs become huge in time and I chose Cassandra mainly for its horizontal scalability regarding disk size) and a client of mine needs to install that on his hosts.

Problem is, while I usually use a cluster of 6 "smallish" nodes (which can grow in time), he only has big ESX servers with huge disk space (which is already RAID-6 redundant) but wouldn't have the possibility to have 3+ nodes per DC.

This is out of my usual experience with Cassandra and, as far as I read around, out of most use-cases found on the website or this mailing list, so the question is: does it make sense to use Cassandra with a big (let's talk 6TB today, up to 20TB in a few years) single-node DataCenter, and another single-node DataCenter (to act as disaster recovery)?

Thanks in advance for any suggestion or comment!

Reply via email to