I am running an 8 node cassandra cluster with each node on its own dedicated VM.
My app very quickly populates the database with about 100,000 rows of data (each row is about 100K bytes) times the number of nodes in my cluster so there's about 100,000 rows of data on each node (seems very evenly distributed). I have been running my app fairly successfully but today changed the replication factor from 1 to 3. (I first took down the servers, nuked their data directories, copied over the new storage-conf.xml to each node, then restarted the servers.) My app begins by populating the database with fresh data. During the writing phase, all the cassandra servers, one by one, started getting an out-of-memory exception. Here's the output from the first to die: INFO [COMMIT-LOG-WRITER] 2010-06-10 14:18:54,609 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/var/lib/cassandra/commitlog/CommitLog-1276193883235.log) INFO [ROW-MUTATION-STAGE:5] 2010-06-10 14:18:55,499 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(Standard1)@19571399 INFO [GMFD:1] 2010-06-10 14:19:01,556 Gossiper.java (line 568) InetAddress /10.210.69.221 is now UP INFO [GMFD:1] 2010-06-10 14:20:35,136 Gossiper.java (line 568) InetAddress /10.254.242.228 is now UP INFO [GMFD:1] 2010-06-10 14:20:35,137 Gossiper.java (line 568) InetAddress /10.201.207.129 is now UP INFO [GMFD:1] 2010-06-10 14:20:36,922 Gossiper.java (line 568) InetAddress /10.198.37.241 is now UP INFO [GC inspection] 2010-06-10 14:19:03,722 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 2164 ms, 8754168 reclaimed leaving 1070909048 used; max is 1174339584 INFO [GC inspection] 2010-06-10 14:21:09,068 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 2151 ms, 78896080 reclaimed leaving 994679752 used; max is 1174339584 INFO [Timer-1] 2010-06-10 14:21:09,068 Gossiper.java (line 179) InetAddress /10.198.37.241 is now dead. INFO [Timer-1] 2010-06-10 14:21:12,045 Gossiper.java (line 179) InetAddress /10.210.69.221 is now dead. INFO [GMFD:1] 2010-06-10 14:21:12,046 Gossiper.java (line 568) InetAddress /10.210.203.210 is now UP INFO [GMFD:1] 2010-06-10 14:21:12,306 Gossiper.java (line 568) InetAddress /10.210.69.221 is now UP INFO [GMFD:1] 2010-06-10 14:21:12,306 Gossiper.java (line 568) InetAddress /10.192.218.117 is now UP INFO [GMFD:1] 2010-06-10 14:21:12,306 Gossiper.java (line 568) InetAddress /10.198.37.241 is now UP INFO [GMFD:1] 2010-06-10 14:21:12,307 Gossiper.java (line 568) InetAddress /10.254.138.226 is now UP ERROR [ROW-MUTATION-STAGE:25] 2010-06-10 14:21:15,127 CassandraDaemon.java (line 78) Fatal exception in thread Thread[ROW-MUTATION-STAGE:25,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:84) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:29) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns (ColumnFamilySerializer.java:117) at org.apache.cassandra.db.ColumnFamilySerializer.deserialize (ColumnFamilySerializer.java:108) at org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps (RowMutation.java:359) at org.apache.cassandra.db.RowMutationSerializer.deserialize (RowMutation.java:369) at org.apache.cassandra.db.RowMutationSerializer.deserialize (RowMutation.java:322) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb (RowMutationVerbHandler.java:45) at org.apache.cassandra.net.MessageDeliveryTask.run (MessageDeliveryTask.java:40) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask (ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) ERROR [ROW-MUTATION-STAGE:18] 2010-06-10 14:21:15,129 CassandraDaemon.java (line 78) Fatal exception in thread Thread[ROW-MUTATION-STAGE:18,5,main] Within 15 minutes, all 8 nodes died while my app continued trying to populate the database. Is there something I am doing wrong? I am populating the database very quickly by writing 100 rows at once in each of 8 clients, until each client has written 100,000 rows. All of my cassandra servers are started up with 1GB of heap space: /usr/bin/java -ea -Xms128M -Xmx1G … Thank you for your help! Julie