RAID0 would help me use more efficiently the total disk space available at each
node, but tests have shown that under write load it behaves much worse than
using separate data dirs, one per disk.
there are different strategies how RAID0 splits reads, also changing io
scheduler and filesystem helps. I found that ZFS/ZRAID is best,
especially backups are very good. If you dont plan to do backups ext4 is
not bad either, but compactions are rather slow on it.
I used a 3-node cluster, and the node with RAID0 kept getting behind the
other two nodes which had separate data dirs. The problem with separate data
dirs is that it seems to be difficult for Cassandra to use the space
efficiently due to the compactions.
If you need to think about disk free space on nodes, then you do not
have enough storage. TB drives are cheap today, buy some. Cluster should
not be designed - we will be lucky if all our data fits there and we
will not run out of space during major compactions.
I first tried the new Leveled compactions scheme, which seemed promising since it would
create "small" files that could be scattered by the data dirs, but the IO
necessary for this compaction scheme is enormous under write load.
yes. its for mostly read only apps. but raising base table size to
something larger like 50 MB helps.
Am I missing something here? Is this the best way to deal with this (abnormal)
use case?
It takes time to learn how to tune cassandra properly. If you do not
have time, hire somebody who will do it for you. It took me few months
to master and its kinda difficult to explain it over mail.