Which version of Cassandra? Which version of Java JVM are you using? What do your I/O stats look like when bulk importing? When you run `nodeprobe -host XXXX tpstats` is any thread pool backing up during the import?
-Chris 2010/4/26 Roland Hänel <rol...@haenel.me> > I have a cluster of 5 machines building a Cassandra datastore, and I load > bulk data into this using the Java Thrift API. The first ~250GB runs fine, > then, one of the nodes starts to throw OutOfMemory exceptions. I'm not using > and row or index caches, and since I only have 5 CF's and some 2,5 GB of RAM > allocated to the JVM (-Xmx2500M), in theory, that should happen. All inserts > are done with consistency level ALL. > > I hope with this I have avoided all the 'usual dummy errors' that lead to > OOM's. I have begun to troubleshoot the issue with JMX, however, it's > difficult to catch the JVM in the right moment because it runs well for > several hours before this thing happens. > > One thing gets to my mind, maybe one of the experts could confirm or reject > this idea for me: is it possible that when one machine slows down a little > bit (for example because a big compaction is going on), the memtables don't > get flushed to disk as fast as they are building up under the continuing > bulk import? That would result in a downward spiral, the system gets slower > and slower on disk I/O, but since more and more data arrives over Thrift, > finally OOM. > > I'm using the "periodic" commit log sync, maybe also this could create a > situation where the commit log writer is too slow to catch up with the data > intake, resulting in ever growing memory usage? > > Maybe these thoughts are just bullshit. Let me now if so... ;-) > > >