I didn't get a response to this, so I'll give it another shot. I tweaked some parameters and cleaned up my schema. My Hadoop/Cassandra job got further, but still dies with an OOM error. This time, the heap dump displays a JMXConfigurableThradPoolExecutor with a retained heap of 7.5G. I presume this means that the Hadoop job is writing to Cassandra faster than Cassandra can write to disk. Is there anything I can do to throttle the job? The Cassandra cluster is set up with default configuration values except for a reduced memtable size.
Forgot to mention this is Cassandra 1.1.2 Thanks in advance. Brian On Sep 12, 2012, at 7:52 AM, Brian Jeltema wrote: > I'm a fairly novice Cassandra/Hadoop guy. I have written a Hadoop job (using > the Cassandra/Hadoop integration API) > that performs a full table scan and attempts to populate a new table from the > results of the map/reduce. The read > works fine and is fast, but the table insertion is failing with OOM errors > (in the Cassandra VM). The resulting heap dump from one node shows that > 2.9G of the heap is consumed by a JMXConfigurableThreadPoolExecutor that > appears to be full of batch mutations. > > I'm using a 6-node cluster, 32G per node, 8G heap, RF=3, if any of that > matters. > > Any suggestions would be appreciated regarding configuration changes or > additional information I might > capture to understand this problem. > > Thanks > > Brian J