interesting,

first just to make sure: since replicateOnWrite is for Counters, you
are using counters (you use the word "insert" instead of
"add/increment" ) right?

if you are using counters, supposedly the leader runs
replicateOnWrite, somehow all your adds find the one box as leader,
that's probably something worth investigating.

finally, u could try debugging into those replicateOnWrite tasks and
see what they are doing exactly, maybe there is something immediately
wrong



On Mon, Jul 11, 2011 at 11:57 PM, David Hawthorne <dha...@gmx.3crowd.com> wrote:
> This is now my fourth attempt to get the message through.  Apologies if you 
> see multiple copies.
>
> I've tried to give as much relevant data as I can think of, but please let me 
> know if you need any other info.  I spent the day getting jmxtrans to talk to 
> statsd with the cassandra JMX data, so I can provide graphs of just about 
> anything you want.  I can also re-run the tests to replicate the problem if 
> need be.  As always, thanks for taking a look!
>
> I have a 4 host test cluster that I'm writing counters into running 0.8.1, 
> and I'm writing to it with hector with CL.ONE or CL.QUORUM (test was run with 
> each and both gave the same results).  Partitioner is the RandomPartitioner, 
> and RF=3.  nodetool ring shows that actual data is well balanced:
>
> Address         DC          Rack        Status State   Load            Owns   
>  Token
>                                                                              
> 127605887595351923798765477786913079296
> 10.0.0.57    datacenter1 rack1       Up     Normal  1.59 GB         25.00%  0
> 10.0.0.56    datacenter1 rack1       Up     Normal  1.39 GB         25.00%  
> 42535295865117307932921825928971026432
> 10.0.0.55    datacenter1 rack1       Up     Normal  1.24 GB         25.00%  
> 85070591730234615865843651857942052864
> 10.0.0.54    datacenter1 rack1       Up     Normal  1.34 GB         25.00%  
> 127605887595351923798765477786913079296
>
> I also tested against a 1-box test cluster with RF=1.
>
> Both show the same results:  high performance for a while, and then 
> ReplicateOnWrite backs way up (I've seen it as high as a million), the 
> cassandra process becomes unresponsive, and the hector client starts throwing 
> exceptions.  High performance in this case equates to about 5-20k inserts/sec 
> on both the 4-box and 1-box cluster (I'm inserting one long row at a time, so 
> the performance is bounded by the one box owning the row).  The cassandra 
> process never seems to recover, even if left for 24 hours.  It's still 
> chewing through those ReplicateOnWrite pending tasks.
>
> GC doesn't seem to be a factor.  Logs show GCs completing in < 1s on all 4 
> boxes in the test cluster, as well as on the 1 standalone server.
>
> I'm also seeing highly erratic performance from the cluster in general:  
> inserts/sec usually start at around 3k, but then maybe creep up towards 6k, 
> then drop back down to below 1k for an extended period of time, then maybe 1% 
> of the time they'll spike up to 8k or even 15k, but that doesn't last more 
> than 30 seconds.
>
> Here's the tpstats output from all 4 boxes:
>
> Slowest/most problematic/highest cpu usage one:
>
> This tpstats output is also representative of the one box cluster about 20 
> minutes into my insert run.
>
> 4 $ ./nodetool -h localhost tpstats
> Pool Name                    Active   Pending      Completed
> ReadStage                         0         0              0
> RequestResponseStage              0         0       59095840
> MutationStage                     0         0       31182942
> ReadRepairStage                   0         0              0
> ReplicateOnWriteStage            32    928042       28246716
> GossipStage                       1         1         449464
> AntiEntropyStage                  0         0              0
> MigrationStage                    0         0              4
> MemtablePostFlusher               0         0            445
> StreamStage                       0         0              0
> FlushWriter                       0         0            445
> FILEUTILS-DELETE-POOL             0         0             38
> MiscStage                         0         0              0
> FlushSorter                       0         0              0
> InternalResponseStage             0         0              4
> HintedHandoff                     0         0              0
>
> Second highest cpu usage one:
>
> 3 $ ./nodetool -h localhost tpstats
> Pool Name                    Active   Pending      Completed
> ReadStage                         0         0              0
> RequestResponseStage              0         0       27413910
> MutationStage                     0         0       27523094
> ReadRepairStage                   0         0              0
> ReplicateOnWriteStage             0         0       11111971
> GossipStage                       0         0         335422
> AntiEntropyStage                  0         0              0
> MigrationStage                    0         0              4
> MemtablePostFlusher               0         0             66
> StreamStage                       0         0              0
> FlushWriter                       0         0             66
> FILEUTILS-DELETE-POOL             0         0             31
> MiscStage                         0         0              0
> FlushSorter                       0         0              0
> InternalResponseStage             0         0              4
> HintedHandoff                     0         0              0
>
> The other 2 that see minimal cpu usage:
>
> 2 $ ./nodetool -h localhost tpstats
> Pool Name                    Active   Pending      Completed
> ReadStage                         0         0              0
> RequestResponseStage              0         0        9213999
> MutationStage                     0         0       33233114
> ReadRepairStage                   0         0              0
> ReplicateOnWriteStage             0         0              0
> GossipStage                       0         0         336016
> AntiEntropyStage                  0         0              0
> MigrationStage                    0         0              4
> MemtablePostFlusher               0         0             77
> StreamStage                       0         0              0
> FlushWriter                       0         0             77
> FILEUTILS-DELETE-POOL             0         0             40
> MiscStage                         0         0              0
> FlushSorter                       0         0              0
> InternalResponseStage             0         0              4
> HintedHandoff                     0         0              0
>
>
> 1 $ ./nodetool -h localhost tpstats
> Pool Name                    Active   Pending      Completed
> ReadStage                         0         0              0
> RequestResponseStage              0         0       13796875
> MutationStage                     0         0       36116662
> ReadRepairStage                   0         0              0
> ReplicateOnWriteStage             0         0              0
> GossipStage                       0         0         336496
> AntiEntropyStage                  0         0              0
> MigrationStage                    0         0              4
> MemtablePostFlusher               0         0             91
> StreamStage                       0         0              0
> FlushWriter                       0         0             91
> FILEUTILS-DELETE-POOL             0         0             43
> MiscStage                         0         0              0
> FlushSorter                       0         0              0
> InternalResponseStage             0         0              4
> HintedHandoff                     0         0              0

Reply via email to