I'm a fairly novice Cassandra/Hadoop guy. I have written a Hadoop job (using the Cassandra/Hadoop integration API) that performs a full table scan and attempts to populate a new table from the results of the map/reduce. The read works fine and is fast, but the table insertion is failing with OOM errors (in the Cassandra VM). The resulting heap dump from one node shows that 2.9G of the heap is consumed by a JMXConfigurableThreadPoolExecutor that appears to be full of batch mutations.
I'm using a 6-node cluster, 32G per node, 8G heap, RF=3, if any of that matters. Any suggestions would be appreciated regarding configuration changes or additional information I might capture to understand this problem. Thanks Brian J