I am running a 10 node cassandra 0.6.1 cluster with a replication factor of 3.
To populate the database to perform my read benchmarking, I have 8 applications using Thrift, each connecting to a different cassandra server and writing 100,000 rows of data (100 KB each row), using a consistencyLevel of ALL. My server nodes are ec2-smalls (1.7GB memory, 100GB disk). With consistency set to ALL, it takes 5-6 minutes for each app to write 10,000 (100 KB) rows. When each of my 8 writing apps reaches about 90,000 rows written, I start seeing write timeouts but my app retries twice and all data appears to get written. It sppears to take about 1hr 45min for all compacting to complete. Coinciding with my write timeouts, all 10 of my cassandra servers are getting the following exception written to system.log: INFO [FLUSH-WRITER-POOL:1] 2010-06-15 13:13:54,411 Memtable.java (line 162) Completed flushing /var/lib/cassandra/data/Keyspace1/Standard1-359-Data.db ERROR [MESSAGE-STREAMING-POOL:1] 2010-06-15 13:13:59,145 DebuggableThreadPoolExecutor.java (line 101) Error in ThreadPoolExecutor java.lang.RuntimeException: java.io.IOException: Value too large for defined data type at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask (ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: Value too large for defined data type at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516) at org.apache.cassandra.net.FileStreamTask.stream(FileStreamTask.java:95) at org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:63) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more ERROR [MESSAGE-STREAMING-POOL:1] 2010-06-15 13:13:59,146 CassandraDaemon.java (line 78) Fatal exception in thread Thread[MESSAGE-STREAMING-POOL:1,5,main] java.lang.RuntimeException: java.io.IOException: Value too large for defined data type at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask (ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: Value too large for defined data type at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516) at org.apache.cassandra.net.FileStreamTask.stream(FileStreamTask.java:95) at org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:63) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more On 8 out of 10 servers, I see this just before the exception: INFO [AE-SERVICE-STAGE:1] 2010-06-15 13:41:36,292 StreamOut.java (line 66) Sending a stream initiate message to /10.210.34.212 ... ERROR [MESSAGE-STREAMING-POOL:1] 2010-06-15 13:43:32,956 DebuggableThreadPoolExecutor.java (line 101) Error in ThreadPoolExecutor On the other 2 servers, I see the AE-SERVICE stream initiate message about 6-9 minutes prior to the exception. Another thing that is odd is that even when the server nodes are quiescent because compacting is complete, I am still seeing cpu usage stay at about 40% . Even after several hours, no reading or writing to the database and all compactions complete, the cpu usage is staying around 40%. Thank you for your help and advice, Julie