> of a rogue large row is one I never considered. The largest row on the other > nodes is as much as 800megs. I can not get a cfstats reading on the bad node
WIth 0.6 I can definitely see this being a problem if I understand its behavior correctly (I have not actually used 0.6 even for testing). In particular such amounts of data is likely to end up directly in the old generation in the GC (normally the young generation is smaller than 800 mb, and that does not take into account the time it takes to actually read and process those large rows and the likelyhood of a young-generation gc triggering anyway due to other normal activity). Having a single value be 10% of the total heap size is likely to be problematic in general (that could be said in some cases (e.g. 32 bit virtual memory space and fragmentation issues) for e.g. malloc()/free() too; algorithms solving general allocation problems are often not very good at dealing with extreme outliers). > so do not know how big its largest row is. I will raise memory to 16gb and > see if that makes a difference. I had though that the java heap sizes that > high had issues on their own in term of GC. The garbage collector may or may not have issues in particular cases, and to some extent the heap size is definitely a factor. However, a lot of other things play in, including the application's overall allocation behavior and pointer writing behavior. A large heap size in and of itself should not be a huge problem; if you combine a very large heap size with lots of allocation and lots of behavior that is difficult for the particular GC to deal with, you may be more likely to have problems. My gut feeling with Cassandra is that I expect it to be fine, with the worst case being having to tweak GC settings to e.g. make the concurrent mark/sweep phases kick in earlier. In other words I would not expect Cassandra to be an application where it becomes problematic to keep CMS pause times down. However, I have no hard evidence of that. I'd be very interested to hear if people have other experiences in production environments with very large heap sizes. -- / Peter Schuller