>  There's some words on the 'Net that - the recent pages on
>  Riptano's site in fact - that strongly encourage scaling left
>  and right, rather than beefing up the boxes - and certainly
>  we're seeing far less bother from GC using a much smaller
>  heap - previously we'd been going up to 16GB, or even
>  higher.  This is based on my previous positive experiences
>  of getting better performance from memory hog apps (eg.
>  Java) by giving them more memory.  In any case, it seems
>  that using large amounts of memory on EC2 is just asking
>  for trouble.

Keep in mind that while GC tends to be more efficient with larger heap
sizes, that does not always translate into better overall performance
when other things have to be considered. In particular, in the case of
Cassandra, if you "waste" 10-15 gigs of RAM on the JVM heap for a
Cassandra instances which could live with e.g. 1 GB, you're actively
taking away those 10-15 gigs of RAM from the operating system to use
for the buffer cache. Particularly if you're I/O bound on reads then,
this could have very detrimental effects (assuming the data set is
sufficiently small and locality is such that 15 GB of extra buffer
cache makes a difference; usually, but not always, this is the case).

So with Cassandra, in the general case, you definitely want to keep
hour heap size reasonable in relation to the actual live set (amount
of actually reachable data), rather than just cranking it up as much
as possible.

(The main issue here is also keeping it high enough to not OOM, given
that exact memory demands are hard to predict; it would be absolutely
great if the JVM was better at maintaining a reasonable heap size to
live set size ratio so that much less tweaking of heap sizes was
necessary, but this is not the case.)

-- 
/ Peter Schuller

Reply via email to