I am running a 10 node cassandra 0.6.1 cluster with a replication factor of 3. 

To populate the database to perform my read benchmarking, I have 8 applications 
using Thrift, each connecting to a different cassandra server and writing 
100,000 rows of data (100 KB each row), using a consistencyLevel of ALL. My 
server nodes are ec2-smalls (1.7GB memory, 100GB disk).

With consistency set to ALL, it takes 5-6 minutes for each app to write 
10,000 (100 KB) rows.  When each of my 8 writing apps reaches about 90,000 rows 
written, I start seeing write timeouts but my app retries twice and all data 
appears to get written.

It sppears to take about 1hr 45min for all compacting to complete.

Coinciding with my write timeouts, all 10 of my cassandra servers are getting 
the following exception written to system.log:


 INFO [FLUSH-WRITER-POOL:1] 2010-06-15 13:13:54,411 Memtable.java (line 162) 
Completed flushing /var/lib/cassandra/data/Keyspace1/Standard1-359-Data.db
ERROR [MESSAGE-STREAMING-POOL:1] 2010-06-15 13:13:59,145 
DebuggableThreadPoolExecutor.java (line 101) Error in ThreadPoolExecutor
java.lang.RuntimeException: java.io.IOException: Value too large for defined 
data type 
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Value too large for defined data type
        at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
        at 
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
        at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
        at 
org.apache.cassandra.net.FileStreamTask.stream(FileStreamTask.java:95)
        at 
org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:63)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        ... 3 more
ERROR [MESSAGE-STREAMING-POOL:1] 2010-06-15 13:13:59,146 CassandraDaemon.java 
(line 78) Fatal exception in thread Thread[MESSAGE-STREAMING-POOL:1,5,main]
java.lang.RuntimeException: java.io.IOException: Value too large for defined 
data type
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Value too large for defined data type
        at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
        at 
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
        at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
        at 
org.apache.cassandra.net.FileStreamTask.stream(FileStreamTask.java:95)
        at 
org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:63)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        ... 3 more

On 8 out of 10 servers, I see this just before the exception:

 INFO [AE-SERVICE-STAGE:1] 2010-06-15 13:41:36,292 StreamOut.java (line 66)
Sending a stream initiate message to /10.210.34.212 ...
ERROR [MESSAGE-STREAMING-POOL:1] 2010-06-15 13:43:32,956
DebuggableThreadPoolExecutor.java (line 101) Error in ThreadPoolExecutor

On the other 2 servers, I see the AE-SERVICE stream initiate message about 6-9
minutes prior to the exception.

Another thing that is odd is that even when the server nodes are quiescent 
because compacting is complete, I am still seeing cpu usage stay at 
about 40% .  Even after several hours, no reading or writing to the database 
and all compactions complete, the cpu usage is staying around 40%.

Thank you for your help and advice,
Julie

Reply via email to