That means you only have a 1G heap. It's no surprise it dies (most likely OOM; CMS runs are not inherently bad). Don't see immediately why you are seeing the remote latency go up that high, but it is unlikely yo be a Cassandra problem.
On Sat, Aug 28, 2010 at 4:01 PM, Fernando Racca <fra...@gmail.com> wrote: > cassandra.in.sh is default, just changed the jmx port > Storage.conf > <Storage> > > <ClusterName>Benchmark Cluster</ClusterName> > <AutoBootstrap>true</AutoBootstrap> > <HintedHandoffEnabled>true</HintedHandoffEnabled> > <Keyspaces> > > <Keyspace Name="usertable"> > <ColumnFamily Name="data" CompareWith="UTF8Type"/> > > <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy> > <ReplicationFactor>2</ReplicationFactor> > > <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch> > </Keyspace> > </Keyspaces> > > <Authenticator>org.apache.cassandra.auth.AllowAllAuthenticator</Authenticator> > <Partitioner>org.apache.cassandra.dht.OrderPreservingPartitioner</Partitioner> > > <InitialToken></InitialToken> > > <CommitLogDirectory>/Developer/Applications/cassandra/commitlog</CommitLogDirectory> > <DataFileDirectories> > > <DataFileDirectory>/Developer/Applications/cassandra/data</DataFileDirectory> > </DataFileDirectories> > > <Seeds> > <Seed>192.168.1.2</Seed> <!-- primary node --> > <Seed>192.168.1.4</Seed> <!-- secondary node --> > </Seeds> > <RpcTimeoutInMillis>10000</RpcTimeoutInMillis> > > <CommitLogRotationThresholdInMB>128</CommitLogRotationThresholdInMB> > <ListenAddress>192.168.1.2</ListenAddress> > <StoragePort>7000</StoragePort> > <ThriftAddress>192.168.1.2</ThriftAddress> > <ThriftPort>9160</ThriftPort> > <ThriftFramedTransport>false</ThriftFramedTransport> > <DiskAccessMode>auto</DiskAccessMode> > <RowWarningThresholdInMB>512</RowWarningThresholdInMB> > <SlicedBufferSizeInKB>64</SlicedBufferSizeInKB> > <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB> > <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB> > <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB> > > <MemtableThroughputInMB>64</MemtableThroughputInMB> > <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB> > > <MemtableOperationsInMillions>0.3</MemtableOperationsInMillions> > > <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes> > > <ConcurrentReads>8</ConcurrentReads> > <ConcurrentWrites>32</ConcurrentWrites> > <CommitLogSync>periodic</CommitLogSync> > > <CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS> > > <GCGraceSeconds>864000</GCGraceSeconds> > </Storage> > > when I run it both server and client locally, no clustering, im not > experiencing any delays. it averaging 5000 ops a second, maxes out cpu and > network card outputs 11mb/sec > Unfortunately when trying to generate load remotely, the client is uber > slow. it seems to not be able to send more than 500kb /sec, even while i > should be able to do at least 1.5mb /sec such as when copying over scp. my > laptops are connected wireless through a router, so network speed is not > meant to be great, but this is too slow. > The client is a pure thrift based > code http://github.com/brianfrankcooper/YCSB/blob/master/db/cassandra-0.6/src/com/yahoo/ycsb/db/CassandraClient6.java > > Localhost the latency is <3 ms > Starting test. > Starting test. > 0 sec: 0 operations; > 10 sec: 54012 operations; 5383.43 current ops/sec; [INSERT > AverageLatency(ms)=1.7] > 20 sec: 102657 operations; 4863.53 current ops/sec; [INSERT > AverageLatency(ms)=1.99] > 30 sec: 151330 operations; 4867.3 current ops/sec; [INSERT > AverageLatency(ms)=1.97] > 40 sec: 199265 operations; 4790.15 current ops/sec; [INSERT > AverageLatency(ms)=2] > 50 sec: 246070 operations; 4676.76 current ops/sec; [INSERT > AverageLatency(ms)=2.07] > 60 sec: 298864 operations; 5278.87 current ops/sec; [INSERT > AverageLatency(ms)=1.81] > 70 sec: 340002 operations; 4113.8 current ops/sec; [INSERT > AverageLatency(ms)=2.37] > 80 sec: 386824 operations; 4682.2 current ops/sec; [INSERT > AverageLatency(ms)=2.05] > 90 sec: 431027 operations; 4420.3 current ops/sec; [INSERT > AverageLatency(ms)=2.18] > 100 sec: 483440 operations; 5241.82 current ops/sec; [INSERT > AverageLatency(ms)=1.81] > 110 sec: 523785 operations; 4034.5 current ops/sec; [INSERT > AverageLatency(ms)=2.39] > 120 sec: 576850 operations; 5306.5 current ops/sec; [INSERT > AverageLatency(ms)=1.79] > 130 sec: 622157 operations; 4530.25 current ops/sec; [INSERT > AverageLatency(ms)=2.13] > 140 sec: 669102 operations; 4694.5 current ops/sec; [INSERT > AverageLatency(ms)=2.05] > 150 sec: 714394 operations; 4529.2 current ops/sec; [INSERT > AverageLatency(ms)=2.13] > 160 sec: 760176 operations; 4578.2 current ops/sec; [INSERT > AverageLatency(ms)=2.09] > 170 sec: 809245 operations; 4906.9 current ops/sec; [INSERT > AverageLatency(ms)=1.96] > 180 sec: 855002 operations; 4574.33 current ops/sec; [INSERT > AverageLatency(ms)=2.11] > 190 sec: 904312 operations; 4930.51 current ops/sec; [INSERT > AverageLatency(ms)=1.93] > 200 sec: 949707 operations; 4539.5 current ops/sec; [INSERT > AverageLatency(ms)=2.12] > 210 sec: 998662 operations; 4895.99 current ops/sec; [INSERT > AverageLatency(ms)=1.71] > 210 sec: 1000000 operations; 3387.34 current ops/sec; [INSERT > AverageLatency(ms)=0.38] > remotely is ~30ms > Loading workload... > Starting test. > 0 sec: 0 operations; > 10 sec: 3369 operations; 336.4 current ops/sec; [INSERT > AverageLatency(ms)=29.13] > 20 sec: 6775 operations; 340.57 current ops/sec; [INSERT > AverageLatency(ms)=29.29] > 30 sec: 10194 operations; 341.9 current ops/sec; [INSERT > AverageLatency(ms)=29.2] > 40 sec: 13659 operations; 346.5 current ops/sec; [INSERT > AverageLatency(ms)=28.81] > 50 sec: 17108 operations; 344.87 current ops/sec; [INSERT > AverageLatency(ms)=28.94] > 60 sec: 20584 operations; 347.6 current ops/sec; [INSERT > AverageLatency(ms)=28.72] > 70 sec: 24017 operations; 343.27 current ops/sec; [INSERT > AverageLatency(ms)=29.04] > 80 sec: 27458 operations; 344.1 current ops/sec; [INSERT > AverageLatency(ms)=29] > 90 sec: 30939 operations; 348.1 current ops/sec; [INSERT > AverageLatency(ms)=28.7] > 100 sec: 34399 operations; 346 current ops/sec; [INSERT > AverageLatency(ms)=28.83] > 110 sec: 37888 operations; 348.9 current ops/sec; [INSERT > AverageLatency(ms)=28.61] > 120 sec: 41381 operations; 349.27 current ops/sec; [INSERT > AverageLatency(ms)=28.59] > when running the same job both server and client on my second box, it > outputs multiple GC concurrent mark and sweep and eventually the node dies > NFO 23:56:15,739 GC for ConcurrentMarkSweep: 1288 ms, 5201448 reclaimed > leaving 1077816616 used; max is 1207828480 > INFO 23:56:15,739 Pool Name Active Pending > INFO 23:56:15,742 STREAM-STAGE 0 0 > INFO 23:56:15,743 FILEUTILS-DELETE-POOL 0 0 > INFO 23:56:15,744 RESPONSE-STAGE 0 0 > INFO 23:56:15,744 ROW-READ-STAGE 0 0 > INFO 23:56:15,745 LB-OPERATIONS 0 0 > INFO 23:56:15,745 MISCELLANEOUS-POOL 0 0 > INFO 23:56:15,746 GMFD 0 2 > INFO 23:56:15,747 CONSISTENCY-MANAGER 0 0 > INFO 23:56:15,747 LB-TARGET 0 0 > INFO 23:56:15,748 ROW-MUTATION-STAGE 0 6 > INFO 23:56:15,749 MESSAGE-STREAMING-POOL 0 0 > INFO 23:56:15,749 LOAD-BALANCER-STAGE 0 0 > INFO 23:56:15,750 FLUSH-SORTER-POOL 0 0 > INFO 23:56:15,750 MEMTABLE-POST-FLUSHER 1 1 > INFO 23:56:15,751 AE-SERVICE-STAGE 0 0 > INFO 23:56:15,751 FLUSH-WRITER-POOL 1 1 > INFO 23:56:15,752 HINTED-HANDOFF-POOL 0 0 > INFO 23:56:15,752 CompactionManager n/a 1 > INFO 23:56:17,491 GC for ConcurrentMarkSweep: 1648 ms, 5986176 reclaimed > leaving 1077634256 used; max is 1207828480 > INFO 23:56:17,492 Pool Name Active Pending > INFO 23:56:17,501 STREAM-STAGE 0 0 > INFO 23:56:17,501 FILEUTILS-DELETE-POOL 0 0 > INFO 23:56:17,502 RESPONSE-STAGE 0 1 > INFO 23:56:17,502 ROW-READ-STAGE 0 0 > INFO 23:56:17,503 LB-OPERATIONS 0 0 > INFO 23:56:17,503 MISCELLANEOUS-POOL 0 0 > INFO 23:56:17,504 GMFD 0 0 > INFO 23:56:17,504 CONSISTENCY-MANAGER 0 0 > INFO 23:56:17,504 LB-TARGET 0 0 > INFO 23:56:17,505 ROW-MUTATION-STAGE 0 2 > INFO 23:56:17,505 MESSAGE-STREAMING-POOL 0 0 > INFO 23:56:17,508 LOAD-BALANCER-STAGE 0 0 > INFO 23:56:17,514 FLUSH-SORTER-POOL 0 0 > INFO 23:56:17,515 MEMTABLE-POST-FLUSHER 1 1 > INFO 23:56:17,519 AE-SERVICE-STAGE 0 0 > INFO 23:56:17,527 FLUSH-WRITER-POOL 1 1 > INFO 23:56:17,528 HINTED-HANDOFF-POOL 0 0 > INFO 23:56:18,913 CompactionManager n/a 1 > INFO 23:56:20,591 GC for ConcurrentMarkSweep: 1675 ms, 6052824 reclaimed > leaving 1077609920 used; max is 1207828480 > INFO 23:56:20,592 Pool Name Active Pending > INFO 23:56:20,611 STREAM-STAGE 0 0 > INFO 23:56:20,612 FILEUTILS-DELETE-POOL 0 0 > INFO 23:56:20,613 RESPONSE-STAGE 2 158 > INFO 23:56:20,613 ROW-READ-STAGE 0 0 > INFO 23:56:20,614 LB-OPERATIONS 0 0 > INFO 23:56:20,614 MISCELLANEOUS-POOL 0 0 > INFO 23:56:20,615 GMFD 0 0 > INFO 23:56:20,616 CONSISTENCY-MANAGER 0 0 > INFO 23:56:20,616 LB-TARGET 0 0 > INFO 23:56:20,617 ROW-MUTATION-STAGE 0 1 > INFO 23:56:20,617 MESSAGE-STREAMING-POOL 0 0 > INFO 23:56:20,618 LOAD-BALANCER-STAGE 0 0 > INFO 23:56:20,625 FLUSH-SORTER-POOL 0 0 > > the problem seems to be with the second node... > any ideas? > On 28 August 2010 22:49, Benjamin Black <b...@b3k.us> wrote: >> >> cassandra.in.sh? >> storage-conf.xml? >> output of iostat -x while this is going on? >> turn GC log level to debug? >> >> On Sat, Aug 28, 2010 at 2:02 PM, Fernando Racca <fra...@gmail.com> wrote: >> > Hi, >> > I'm currently executing some benchmarks against 0.6.5, which i plan to >> > compare against 0.7-beta1, using the YCSB client >> > I'm experiencing some strange behaviour when running a small 2 nodes >> > cluster >> > using OrderPreservingPartitioner. Does anybody have any experience on >> > using >> > the client to generate load? >> > It's the first benchmark that i try so i'm probably doing something >> > dumb. >> > A detailed post with screenshots of the VM and CPU history can be seen >> > in >> > this >> > >> > post.http://quantleap.blogspot.com/2010/08/cassandra-065-benchmarking-first-run.html >> > I would very much appreciate your help since i'm doing this benchmarks >> > as >> > part of my master's dissertation >> > A previous official benchmark is documented >> > here http://research.yahoo.com/files/ycsb-v4.pdf >> > Thanks! >> > Fernando Racca > >