> What do you mean by "running live"? I am also planning to use cassandra on

I believe live as in "in production".

> EC2 using small nodes. Small nodes have 1/4 cpu of the large ones, 1/4 cost,
> but I/O is more than 1/4 (amazon does not give explicit I/O numbers...), so
> I think 4 small instances should perform better than 1 large one (and the
> cost is the same), am I wrong?

Without making any claims with respect to Jonathan's reasons for
saying so, I'd be hesitant for reasons including:

* 32 bit -> no mmap()

* Even assuming a fully dedicated CPU for that instance (which won't
be true), things like background compaction, memtable flushing and
concurrent GC would presumably tend to have  larger impact (in
relative terms) on real traffc than on multi-CPU setups.

* I'd be generally skeptical about the potential variation in actual
available CPU and whether the variation will vary across instance
types. It would depend, presumably, quite a bit on luck and how Amazon
overcommits/allocates hosts for instances, but my very limited and
very anecdotal experience is that small instances seem to be more
over-committed (but this could be entirely wrong; don't take my word
for it - anyone else?).

On the other hand, if the only concern is I/O bandwidth rather than
CPU use maybe the situation is different. (Does anyone have numbers on
variation in available disk bandwidth over time for EC2 instances?)

-- 
/ Peter Schuller

Reply via email to