Hi,

I have a serious problem with counters performance and I can't seem to
figure it out.

Basically I'm building a system for accumulating some statistics "on the
fly" via Cassandra distributed counters. For this I need counter updates to
work "really fast" and herein lies my problem -- as soon as I enable
replication_factor = 2, the performance goes down the drain. This happens in
my tests using both 1.0.x and 1.1.6.

Let me elaborate:

I have two boxes (virtual servers on top of physical servers rented
specifically for this purpose, i.e. it's not a cloud, nor it is shared;
virtual servers are managed by our admins as a way to limit damage as I
suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner because
I want to be able to do some range queries.

First, I set up Cassandra individually on each box (not in a cluster) and
test counter increments performance (exclusively increments, no reads). For
tests I use code that is intended to somewhat resemble the expected load
pattern -- particularly the majority of increments create new counters with
some updating (adding) to already existing counters. In this test each
single node exhibits respectable performance - something on the order of 70k
(seventy thousand) increments per second.

I then join both of these nodes into single cluster (using SimpleSnitch and
SimpleStrategy, nothing fancy yet). I then run the same test using
replication_factor=1. The performance is on the order of 120k increments per
second -- which seems to be a reasonable increase over the single node
performance.


HOWEVER I then rerun the same test on the two-node cluster using
replication_factor=2 -- which is the least I'll need for actual production
for redundancy purposes. And the performance I get is absolutely horrible --
much, MUCH worse than even single-node performance -- something on the order
of less than 25k increments per second. In addition to clients not being
able to push updates fast enough, I also see a lot of 'messages dropped'
messages in the Cassandra log under this load.

Could anyone advise what could be causing such drastic performance drop
under replication_factor=2? I was expecting something on the order of
single-node performance, not approximately 3x less.


When testing replication_factor=2 on 1.1.6 I can see that CPU usage goes
through the roof. On 1.0.x I think it looked more like disk overload, but
I'm not sure (being on virtual server I apparently can't see true iostats).

I do have Cassandra data on a separate disk, commit log and cache are
currently on the same disk as the system. I experimented with commit log
flush modes and even with disabling commit log at all -- but it doesn't seem
to have noticeable impact on the performance when under
replication_factor=2.


Any suggestions and hints will be much appreciated :) And please let me know
if I need to share additional information about the configuration I'm
running on.

Best regards,
Sergey



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Reply via email to