Thanks Chris 2010/4/26 Chris Goffinet <goffi...@digg.com>
> Upgrade to b20 of Sun's version of JVM. This OOM might be related to > LinkedBlockQueue issues that were fixed. > > -Chris > > > 2010/4/26 Roland Hänel <rol...@haenel.me> > >> Cassandra Version 0.6.1 >> OpenJDK Server VM (build 14.0-b16, mixed mode) >> Import speed is about 10MB/s for the full cluster; if a compaction is >> going on the individual node is I/O limited >> tpstats: caught me, didn't know this. I will set up a test and try to >> catch a node during the critical time. >> >> Thanks, >> Roland >> >> >> 2010/4/26 Chris Goffinet <goffi...@digg.com> >> >> Which version of Cassandra? >>> Which version of Java JVM are you using? >>> What do your I/O stats look like when bulk importing? >>> When you run `nodeprobe -host XXXX tpstats` is any thread pool backing up >>> during the import? >>> >>> -Chris >>> >>> >>> 2010/4/26 Roland Hänel <rol...@haenel.me> >>> >>> I have a cluster of 5 machines building a Cassandra datastore, and I load >>>> bulk data into this using the Java Thrift API. The first ~250GB runs fine, >>>> then, one of the nodes starts to throw OutOfMemory exceptions. I'm not >>>> using >>>> and row or index caches, and since I only have 5 CF's and some 2,5 GB of >>>> RAM >>>> allocated to the JVM (-Xmx2500M), in theory, that should happen. All >>>> inserts >>>> are done with consistency level ALL. >>>> >>>> I hope with this I have avoided all the 'usual dummy errors' that lead >>>> to OOM's. I have begun to troubleshoot the issue with JMX, however, it's >>>> difficult to catch the JVM in the right moment because it runs well for >>>> several hours before this thing happens. >>>> >>>> One thing gets to my mind, maybe one of the experts could confirm or >>>> reject this idea for me: is it possible that when one machine slows down a >>>> little bit (for example because a big compaction is going on), the >>>> memtables >>>> don't get flushed to disk as fast as they are building up under the >>>> continuing bulk import? That would result in a downward spiral, the system >>>> gets slower and slower on disk I/O, but since more and more data arrives >>>> over Thrift, finally OOM. >>>> >>>> I'm using the "periodic" commit log sync, maybe also this could create a >>>> situation where the commit log writer is too slow to catch up with the data >>>> intake, resulting in ever growing memory usage? >>>> >>>> Maybe these thoughts are just bullshit. Let me now if so... ;-) >>>> >>>> >>>> >>> >> >