Re: counters + replication = awful performance?

Juan Valencia Tue, 27 Nov 2012 09:39:19 -0800

Hi Sergey,

I know I've had similar issues with counters which were bottle-necked by
network throughput.  You might be seeing a problem with throughput between
the clients and Cass or between the two Cass nodes.  It might not be your
case, but that was what happened to me :-)


Juan


On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir <solf.li...@gmail.com> wrote:

> Hi,
>
> I have a serious problem with counters performance and I can't seem to
> figure it out.
>
> Basically I'm building a system for accumulating some statistics "on the
> fly" via Cassandra distributed counters. For this I need counter updates to
> work "really fast" and herein lies my problem -- as soon as I enable
> replication_factor = 2, the performance goes down the drain. This happens
> in
> my tests using both 1.0.x and 1.1.6.
>
> Let me elaborate:
>
> I have two boxes (virtual servers on top of physical servers rented
> specifically for this purpose, i.e. it's not a cloud, nor it is shared;
> virtual servers are managed by our admins as a way to limit damage as I
> suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner because
> I want to be able to do some range queries.
>
> First, I set up Cassandra individually on each box (not in a cluster) and
> test counter increments performance (exclusively increments, no reads). For
> tests I use code that is intended to somewhat resemble the expected load
> pattern -- particularly the majority of increments create new counters with
> some updating (adding) to already existing counters. In this test each
> single node exhibits respectable performance - something on the order of
> 70k
> (seventy thousand) increments per second.
>
> I then join both of these nodes into single cluster (using SimpleSnitch and
> SimpleStrategy, nothing fancy yet). I then run the same test using
> replication_factor=1. The performance is on the order of 120k increments
> per
> second -- which seems to be a reasonable increase over the single node
> performance.
>
>
> HOWEVER I then rerun the same test on the two-node cluster using
> replication_factor=2 -- which is the least I'll need for actual production
> for redundancy purposes. And the performance I get is absolutely horrible
> --
> much, MUCH worse than even single-node performance -- something on the
> order
> of less than 25k increments per second. In addition to clients not being
> able to push updates fast enough, I also see a lot of 'messages dropped'
> messages in the Cassandra log under this load.
>
> Could anyone advise what could be causing such drastic performance drop
> under replication_factor=2? I was expecting something on the order of
> single-node performance, not approximately 3x less.
>
>
> When testing replication_factor=2 on 1.1.6 I can see that CPU usage goes
> through the roof. On 1.0.x I think it looked more like disk overload, but
> I'm not sure (being on virtual server I apparently can't see true iostats).
>
> I do have Cassandra data on a separate disk, commit log and cache are
> currently on the same disk as the system. I experimented with commit log
> flush modes and even with disabling commit log at all -- but it doesn't
> seem
> to have noticeable impact on the performance when under
> replication_factor=2.
>
>
> Any suggestions and hints will be much appreciated :) And please let me
> know
> if I need to share additional information about the configuration I'm
> running on.
>
> Best regards,
> Sergey
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>



-- 

Learn More:  SQI (Social Quality Index) - A Universal Measure of Social
Quality

Re: counters + replication = awful performance?

Reply via email to