Re: Benchmarking Cassandra 0.6.5 with YCSB client ... drags to a halt

Fernando Racca Sat, 28 Aug 2010 16:01:55 -0700

cassandra.in.sh is default, just changed the jmx port

Storage.conf


<Storage>

  <ClusterName>Benchmark Cluster</ClusterName>
  <AutoBootstrap>true</AutoBootstrap>
  <HintedHandoffEnabled>true</HintedHandoffEnabled>

  <Keyspaces>

 <Keyspace Name="usertable">
      <ColumnFamily Name="data" CompareWith="UTF8Type"/>

 
<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
      <ReplicationFactor>2</ReplicationFactor>

 <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
   </Keyspace>
  </Keyspaces>


<Authenticator>org.apache.cassandra.auth.AllowAllAuthenticator</Authenticator>

  <Partitioner>org.apache.cassandra.dht.OrderPreservingPartitioner</Partitioner>

  <InitialToken></InitialToken>

  
<CommitLogDirectory>/Developer/Applications/cassandra/commitlog</CommitLogDirectory>
  <DataFileDirectories>

 <DataFileDirectory>/Developer/Applications/cassandra/data</DataFileDirectory>
  </DataFileDirectories>


   <Seeds>
      <Seed>192.168.1.2</Seed> <!-- primary node -->
      <Seed>192.168.1.4</Seed> <!-- secondary node -->
  </Seeds>

  <RpcTimeoutInMillis>10000</RpcTimeoutInMillis>

  <CommitLogRotationThresholdInMB>128</CommitLogRotationThresholdInMB>
  <ListenAddress>192.168.1.2</ListenAddress>
  <StoragePort>7000</StoragePort>

  <ThriftAddress>192.168.1.2</ThriftAddress>
  <ThriftPort>9160</ThriftPort>
    <ThriftFramedTransport>false</ThriftFramedTransport>
  <DiskAccessMode>auto</DiskAccessMode>
  <RowWarningThresholdInMB>512</RowWarningThresholdInMB>
  <SlicedBufferSizeInKB>64</SlicedBufferSizeInKB>

  <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
  <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>

   <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>

  <MemtableThroughputInMB>64</MemtableThroughputInMB>

  <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>

  <MemtableOperationsInMillions>0.3</MemtableOperationsInMillions>

  <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>

  <ConcurrentReads>8</ConcurrentReads>
  <ConcurrentWrites>32</ConcurrentWrites>
  <CommitLogSync>periodic</CommitLogSync>

  <CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS>

  <GCGraceSeconds>864000</GCGraceSeconds>
</Storage>


when I run it both server and client locally, no clustering, im not
experiencing any delays.  it  averaging 5000 ops a second, maxes out cpu and
network card outputs 11mb/sec

Unfortunately when trying to generate load remotely, the client is uber
slow. it seems to not be able to send more than 500kb /sec, even while i
should be able to do at least 1.5mb /sec such as when copying over scp. my
laptops are connected wireless through a router, so network speed is not
meant to be great, but this is too slow.

The client is a pure thrift based code
http://github.com/brianfrankcooper/YCSB/blob/master/db/cassandra-0.6/src/com/yahoo/ycsb/db/CassandraClient6.java


Localhost the latency is <3 ms

Starting test.
 Starting test.
 0 sec: 0 operations;
 10 sec: 54012 operations; 5383.43 current ops/sec; [INSERT
AverageLatency(ms)=1.7]
 20 sec: 102657 operations; 4863.53 current ops/sec; [INSERT
AverageLatency(ms)=1.99]
 30 sec: 151330 operations; 4867.3 current ops/sec; [INSERT
AverageLatency(ms)=1.97]
 40 sec: 199265 operations; 4790.15 current ops/sec; [INSERT
AverageLatency(ms)=2]
 50 sec: 246070 operations; 4676.76 current ops/sec; [INSERT
AverageLatency(ms)=2.07]
 60 sec: 298864 operations; 5278.87 current ops/sec; [INSERT
AverageLatency(ms)=1.81]
 70 sec: 340002 operations; 4113.8 current ops/sec; [INSERT
AverageLatency(ms)=2.37]
 80 sec: 386824 operations; 4682.2 current ops/sec; [INSERT
AverageLatency(ms)=2.05]
 90 sec: 431027 operations; 4420.3 current ops/sec; [INSERT
AverageLatency(ms)=2.18]
 100 sec: 483440 operations; 5241.82 current ops/sec; [INSERT
AverageLatency(ms)=1.81]
 110 sec: 523785 operations; 4034.5 current ops/sec; [INSERT
AverageLatency(ms)=2.39]
 120 sec: 576850 operations; 5306.5 current ops/sec; [INSERT
AverageLatency(ms)=1.79]
 130 sec: 622157 operations; 4530.25 current ops/sec; [INSERT
AverageLatency(ms)=2.13]
 140 sec: 669102 operations; 4694.5 current ops/sec; [INSERT
AverageLatency(ms)=2.05]
 150 sec: 714394 operations; 4529.2 current ops/sec; [INSERT
AverageLatency(ms)=2.13]
 160 sec: 760176 operations; 4578.2 current ops/sec; [INSERT
AverageLatency(ms)=2.09]
 170 sec: 809245 operations; 4906.9 current ops/sec; [INSERT
AverageLatency(ms)=1.96]
 180 sec: 855002 operations; 4574.33 current ops/sec; [INSERT
AverageLatency(ms)=2.11]
 190 sec: 904312 operations; 4930.51 current ops/sec; [INSERT
AverageLatency(ms)=1.93]
 200 sec: 949707 operations; 4539.5 current ops/sec; [INSERT
AverageLatency(ms)=2.12]
 210 sec: 998662 operations; 4895.99 current ops/sec; [INSERT
AverageLatency(ms)=1.71]
 210 sec: 1000000 operations; 3387.34 current ops/sec; [INSERT
AverageLatency(ms)=0.38]

remotely is ~30ms

Loading workload...
Starting test.
 0 sec: 0 operations;
 10 sec: 3369 operations; 336.4 current ops/sec; [INSERT
AverageLatency(ms)=29.13]
 20 sec: 6775 operations; 340.57 current ops/sec; [INSERT
AverageLatency(ms)=29.29]
 30 sec: 10194 operations; 341.9 current ops/sec; [INSERT
AverageLatency(ms)=29.2]
 40 sec: 13659 operations; 346.5 current ops/sec; [INSERT
AverageLatency(ms)=28.81]
 50 sec: 17108 operations; 344.87 current ops/sec; [INSERT
AverageLatency(ms)=28.94]
 60 sec: 20584 operations; 347.6 current ops/sec; [INSERT
AverageLatency(ms)=28.72]
 70 sec: 24017 operations; 343.27 current ops/sec; [INSERT
AverageLatency(ms)=29.04]
 80 sec: 27458 operations; 344.1 current ops/sec; [INSERT
AverageLatency(ms)=29]
 90 sec: 30939 operations; 348.1 current ops/sec; [INSERT
AverageLatency(ms)=28.7]
 100 sec: 34399 operations; 346 current ops/sec; [INSERT
AverageLatency(ms)=28.83]
 110 sec: 37888 operations; 348.9 current ops/sec; [INSERT
AverageLatency(ms)=28.61]
 120 sec: 41381 operations; 349.27 current ops/sec; [INSERT
AverageLatency(ms)=28.59]

when running the same job both server and client on my second box, it
outputs multiple GC concurrent mark and sweep and eventually the node dies

NFO 23:56:15,739 GC for ConcurrentMarkSweep: 1288 ms, 5201448 reclaimed
leaving 1077816616 used; max is 1207828480
 INFO 23:56:15,739 Pool Name                    Active   Pending
 INFO 23:56:15,742 STREAM-STAGE                      0         0
 INFO 23:56:15,743 FILEUTILS-DELETE-POOL             0         0
 INFO 23:56:15,744 RESPONSE-STAGE                    0         0
 INFO 23:56:15,744 ROW-READ-STAGE                    0         0
 INFO 23:56:15,745 LB-OPERATIONS                     0         0
 INFO 23:56:15,745 MISCELLANEOUS-POOL                0         0
 INFO 23:56:15,746 GMFD                              0         2
 INFO 23:56:15,747 CONSISTENCY-MANAGER               0         0
 INFO 23:56:15,747 LB-TARGET                         0         0
 INFO 23:56:15,748 ROW-MUTATION-STAGE                0         6
 INFO 23:56:15,749 MESSAGE-STREAMING-POOL            0         0
 INFO 23:56:15,749 LOAD-BALANCER-STAGE               0         0
 INFO 23:56:15,750 FLUSH-SORTER-POOL                 0         0
 INFO 23:56:15,750 MEMTABLE-POST-FLUSHER             1         1
 INFO 23:56:15,751 AE-SERVICE-STAGE                  0         0
 INFO 23:56:15,751 FLUSH-WRITER-POOL                 1         1
 INFO 23:56:15,752 HINTED-HANDOFF-POOL               0         0
 INFO 23:56:15,752 CompactionManager               n/a         1
 INFO 23:56:17,491 GC for ConcurrentMarkSweep: 1648 ms, 5986176 reclaimed
leaving 1077634256 used; max is 1207828480
 INFO 23:56:17,492 Pool Name                    Active   Pending
 INFO 23:56:17,501 STREAM-STAGE                      0         0
 INFO 23:56:17,501 FILEUTILS-DELETE-POOL             0         0
 INFO 23:56:17,502 RESPONSE-STAGE                    0         1
 INFO 23:56:17,502 ROW-READ-STAGE                    0         0
 INFO 23:56:17,503 LB-OPERATIONS                     0         0
 INFO 23:56:17,503 MISCELLANEOUS-POOL                0         0
 INFO 23:56:17,504 GMFD                              0         0
 INFO 23:56:17,504 CONSISTENCY-MANAGER               0         0
 INFO 23:56:17,504 LB-TARGET                         0         0
 INFO 23:56:17,505 ROW-MUTATION-STAGE                0         2
 INFO 23:56:17,505 MESSAGE-STREAMING-POOL            0         0
 INFO 23:56:17,508 LOAD-BALANCER-STAGE               0         0
 INFO 23:56:17,514 FLUSH-SORTER-POOL                 0         0
 INFO 23:56:17,515 MEMTABLE-POST-FLUSHER             1         1
 INFO 23:56:17,519 AE-SERVICE-STAGE                  0         0
 INFO 23:56:17,527 FLUSH-WRITER-POOL                 1         1
 INFO 23:56:17,528 HINTED-HANDOFF-POOL               0         0
 INFO 23:56:18,913 CompactionManager               n/a         1
 INFO 23:56:20,591 GC for ConcurrentMarkSweep: 1675 ms, 6052824 reclaimed
leaving 1077609920 used; max is 1207828480
 INFO 23:56:20,592 Pool Name                    Active   Pending
 INFO 23:56:20,611 STREAM-STAGE                      0         0
 INFO 23:56:20,612 FILEUTILS-DELETE-POOL             0         0
 INFO 23:56:20,613 RESPONSE-STAGE                    2       158
 INFO 23:56:20,613 ROW-READ-STAGE                    0         0
 INFO 23:56:20,614 LB-OPERATIONS                     0         0
 INFO 23:56:20,614 MISCELLANEOUS-POOL                0         0
 INFO 23:56:20,615 GMFD                              0         0
 INFO 23:56:20,616 CONSISTENCY-MANAGER               0         0
 INFO 23:56:20,616 LB-TARGET                         0         0
 INFO 23:56:20,617 ROW-MUTATION-STAGE                0         1
 INFO 23:56:20,617 MESSAGE-STREAMING-POOL            0         0
 INFO 23:56:20,618 LOAD-BALANCER-STAGE               0         0
 INFO 23:56:20,625 FLUSH-SORTER-POOL                 0         0


the problem seems to be with the second node...

any ideas?

On 28 August 2010 22:49, Benjamin Black <b...@b3k.us> wrote:

> cassandra.in.sh?
> storage-conf.xml?
> output of iostat -x while this is going on?
> turn GC log level to debug?
>
> On Sat, Aug 28, 2010 at 2:02 PM, Fernando Racca <fra...@gmail.com> wrote:
> > Hi,
> > I'm currently executing some benchmarks against 0.6.5, which i plan to
> > compare against 0.7-beta1, using the YCSB client
> > I'm experiencing some strange behaviour when running a small 2 nodes
> cluster
> > using OrderPreservingPartitioner. Does anybody have any experience on
> using
> > the client to generate load?
> > It's the first benchmark that i try so i'm probably doing something dumb.
> > A detailed post with screenshots of the VM and CPU history can be seen in
> > this
> > post.
> http://quantleap.blogspot.com/2010/08/cassandra-065-benchmarking-first-run.html
> > I would very much appreciate your help since i'm doing this benchmarks as
> > part of my master's dissertation
> > A previous official benchmark is documented
> > here http://research.yahoo.com/files/ycsb-v4.pdf
> > Thanks!
> > Fernando Racca
>

Re: Benchmarking Cassandra 0.6.5 with YCSB client ... drags to a halt

Reply via email to