> What do you mean by "running live"? I am also planning to use cassandra on
I believe live as in "in production". > EC2 using small nodes. Small nodes have 1/4 cpu of the large ones, 1/4 cost, > but I/O is more than 1/4 (amazon does not give explicit I/O numbers...), so > I think 4 small instances should perform better than 1 large one (and the > cost is the same), am I wrong? Without making any claims with respect to Jonathan's reasons for saying so, I'd be hesitant for reasons including: * 32 bit -> no mmap() * Even assuming a fully dedicated CPU for that instance (which won't be true), things like background compaction, memtable flushing and concurrent GC would presumably tend to have larger impact (in relative terms) on real traffc than on multi-CPU setups. * I'd be generally skeptical about the potential variation in actual available CPU and whether the variation will vary across instance types. It would depend, presumably, quite a bit on luck and how Amazon overcommits/allocates hosts for instances, but my very limited and very anecdotal experience is that small instances seem to be more over-committed (but this could be entirely wrong; don't take my word for it - anyone else?). On the other hand, if the only concern is I/O bandwidth rather than CPU use maybe the situation is different. (Does anyone have numbers on variation in available disk bandwidth over time for EC2 instances?) -- / Peter Schuller