On Fri, Mar 7, 2014 at 2:01 PM, Donald Smith < donald.sm...@audiencescience.com> wrote:
> Robert, please elaborate why you say "To make best use of Cassandra, my > minimum recommendation is usually RF=3, N=6." > > > I surmise that with any less than 6 nodes, you'd likely perform better > with a sequential/single-node solution. You need at least six nodes to > overcome the overheads from concurrency. But that's a vague explanation. > Briefly : 1) With a RF of less than 3, you are unable to meaningfully use the QUORUM ConsistencyLevel. 2) With a RF of less than 3, edge cases with potential of data loss are significantly more likely. 3) With a RF of less than 3, losing a single node means losing at least 50% of the capacity of your cluster for that range. 4) With a RF of less than 3, two replica nodes happening to Java GC at the same time means a range is unavailable. 5) With a N of less than 6, losing a single node means losing a significant percentage of total cluster capacity. Still-live nodes share the read and write load of the lost node, as well as sharing the overhead of creating its replacement. The "real" minimum minimum to use QUORUM in production is probably RF=3, N=4 or 5. But if you are provisioning correctly, such that your nodes have some but not excessive headroom, N of less than 6 makes losing and replacing a node relatively expensive from a total cluster capacity perspective. =Rob