Phil Stanhope <pstanhope <at> wimba.com> writes:

> 
> How are you doing your inserts?
> 
> I draw a clear line between 1) bootstrapping a cluster with data and 2)
simulating expected/projected
> read/write behavior.
> 
> If you are bootstrapping then I would look into the batch_mutate APIs. They
allow you to improve your
> performance on writes dramatically.
> 
> If you are read/write testing on a populated cluster, insert and batch_insert
(for super columns) are the
> way to go.
> 
> As Ben has pointed to me in numerous threads ... think carefully about your
replication factor. Do you want
> the data on all nodes? Or sufficiently replicated so that you can recover? Do
you want consistency at the
> time of write? Or eventually?
> 
> Cassandra has a bunch of knobs that you can turn ... but that flexibility
requires that you think about your
> expected usage patterns and operational policies.
> 
> -phil
> 

My inserts are being done 100 rows at a time using batch_mutate().
I bring up all 10 nodes in my cassandra cluster at once (no live bootstrapping 
of nodes).  Once they are up, I begin populating the database running 8 write 
clients (on 8 different VMs), each writing 100 rows at a time.  As mentioned 
earlier, each client writes to a different cassandra server node so no one 
server node is fielding all the writes simultaneously.

I have a replication factor of 3 because I need to be able to survive 2 out of 
10 nodes going down at once.

I am baffled by all the "Value too large" exceptions that are occurring on 
every one of my 10 servers:
 ERROR [MESSAGE-STREAMING-POOL:1] 2010-06-14 19:30:24,471 
DebuggableThreadPoolExecutor.java (line 101) Error in ThreadPoolExecutor
java.lang.RuntimeException: java.io.IOException: Value too large for defined
data type
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

It seems to be happening just after this is logged:
INFO [AE-SERVICE-STAGE:1]  2010-06-14 19:28:39,851 StreamOut.java

I'm also baffled that after all compactions are done on every one of the 10 
servers, about 5 out of 10 servers are still at 40% CPU usage, although they 
are doing 0 disk IO. I am not running anything else running on these server 
nodes except for cassandra.  The compactions have been done for over an hour.
The last write took place 5 hours ago.

Thank you for any help,
Julie




Reply via email to