> Wait! maybe this is a quadruple-whammy since we have to account for
> the data being replicated to other nodes. At replication factor 3 only
> 1/3rd of the data on the node actually belongs in that TokenRange, So
> it is not as simple as having small nodes with smaller ranges, you
> also have to consider nodes around it and somehow balance them out to.
> (I am not convinced it can be done)

This is what I was talking about.

However I forgot about the memtable settings etc actually being global
in that sense as you reiterated (this was presumably what Aaron meant
from the start - I mis-interpreted). Solving that in a way that
doesn't make schema management much more complex might be an
interesting problem. Maybe having a per-node scaling factor for some
of these things would help.

But that and the RF issue seem like the major concerns.

You mentioned difficulty w.r.t. not only balancing request amount but
also differing costs per request depending on data sizes - yes, but
that's just a fundamental problem of balancing systems like this. Just
because some node is "twice as fast" using some particular metric
doesn't mean that metric is the only thing of concern for your access
pattern. I don't think Cassandra exacerbates that particularly.

Virtual nodes or some other method of dispersing data across a ring in
a more flexible way may mitigate or eliminate the RF induced problem
(along with having other nice effects).

Anyways, I agree that it is advisable to avoid mixing slim and fat
nodes in a cluster.

-- 
/ Peter Schuller

Reply via email to