I probably could have saved myself some time by saying (as Peter and Edward pointed out) "if you use nodes with different capabilities you will need treat all nodes as having the lowest spec and that could be a waste." :)
Aaron On 23 Mar 2011, at 07:26, Peter Schuller wrote: >> Wait! maybe this is a quadruple-whammy since we have to account for >> the data being replicated to other nodes. At replication factor 3 only >> 1/3rd of the data on the node actually belongs in that TokenRange, So >> it is not as simple as having small nodes with smaller ranges, you >> also have to consider nodes around it and somehow balance them out to. >> (I am not convinced it can be done) > > This is what I was talking about. > > However I forgot about the memtable settings etc actually being global > in that sense as you reiterated (this was presumably what Aaron meant > from the start - I mis-interpreted). Solving that in a way that > doesn't make schema management much more complex might be an > interesting problem. Maybe having a per-node scaling factor for some > of these things would help. > > But that and the RF issue seem like the major concerns. > > You mentioned difficulty w.r.t. not only balancing request amount but > also differing costs per request depending on data sizes - yes, but > that's just a fundamental problem of balancing systems like this. Just > because some node is "twice as fast" using some particular metric > doesn't mean that metric is the only thing of concern for your access > pattern. I don't think Cassandra exacerbates that particularly. > > Virtual nodes or some other method of dispersing data across a ring in > a more flexible way may mitigate or eliminate the RF induced problem > (along with having other nice effects). > > Anyways, I agree that it is advisable to avoid mixing slim and fat > nodes in a cluster. > > -- > / Peter Schuller