Re: counters + replication = awful performance?

Michael Kjellman Tue, 27 Nov 2012 10:08:33 -0800

Are you writing with QUORUM consistency or ONE?

On 11/27/12 9:52 AM, "Sergey Olefir" <solf.li...@gmail.com> wrote:


>Hi Juan,
>
>thanks for your input!
>
>In my case, however, I doubt this is the case -- clients are able to push
>many more updates than I need to saturate replication_factor=2 case (e.g.
>I'm doing as many as 6x more increments when testing 2-node cluster with
>replication_factor=1), so bandwidth between clients and server should be
>sufficient.
>
>Bandwidth between nodes in the cluster should also be quite sufficient
>since
>they are both in the same DC. But it is something to check, thanks!
>
>Best regards,
>Sergey
>
>
>Juan Valencia wrote
>> Hi Sergey,
>> 
>> I know I've had similar issues with counters which were bottle-necked by
>> network throughput.  You might be seeing a problem with throughput
>>between
>> the clients and Cass or between the two Cass nodes.  It might not be
>>your
>> case, but that was what happened to me :-)
>> 
>> Juan
>> 
>> 
>> On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir &lt;
>
>> solf.lists@
>
>> &gt; wrote:
>> 
>>> Hi,
>>>
>>> I have a serious problem with counters performance and I can't seem to
>>> figure it out.
>>>
>>> Basically I'm building a system for accumulating some statistics "on
>>>the
>>> fly" via Cassandra distributed counters. For this I need counter
>>>updates
>>> to
>>> work "really fast" and herein lies my problem -- as soon as I enable
>>> replication_factor = 2, the performance goes down the drain. This
>>>happens
>>> in
>>> my tests using both 1.0.x and 1.1.6.
>>>
>>> Let me elaborate:
>>>
>>> I have two boxes (virtual servers on top of physical servers rented
>>> specifically for this purpose, i.e. it's not a cloud, nor it is shared;
>>> virtual servers are managed by our admins as a way to limit damage as I
>>> suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner
>>> because
>>> I want to be able to do some range queries.
>>>
>>> First, I set up Cassandra individually on each box (not in a cluster)
>>>and
>>> test counter increments performance (exclusively increments, no reads).
>>> For
>>> tests I use code that is intended to somewhat resemble the expected
>>>load
>>> pattern -- particularly the majority of increments create new counters
>>> with
>>> some updating (adding) to already existing counters. In this test each
>>> single node exhibits respectable performance - something on the order
>>>of
>>> 70k
>>> (seventy thousand) increments per second.
>>>
>>> I then join both of these nodes into single cluster (using SimpleSnitch
>>> and
>>> SimpleStrategy, nothing fancy yet). I then run the same test using
>>> replication_factor=1. The performance is on the order of 120k
>>>increments
>>> per
>>> second -- which seems to be a reasonable increase over the single node
>>> performance.
>>>
>>>
>>> HOWEVER I then rerun the same test on the two-node cluster using
>>> replication_factor=2 -- which is the least I'll need for actual
>>> production
>>> for redundancy purposes. And the performance I get is absolutely
>>>horrible
>>> --
>>> much, MUCH worse than even single-node performance -- something on the
>>> order
>>> of less than 25k increments per second. In addition to clients not
>>>being
>>> able to push updates fast enough, I also see a lot of 'messages
>>>dropped'
>>> messages in the Cassandra log under this load.
>>>
>>> Could anyone advise what could be causing such drastic performance drop
>>> under replication_factor=2? I was expecting something on the order of
>>> single-node performance, not approximately 3x less.
>>>
>>>
>>> When testing replication_factor=2 on 1.1.6 I can see that CPU usage
>>>goes
>>> through the roof. On 1.0.x I think it looked more like disk overload,
>>>but
>>> I'm not sure (being on virtual server I apparently can't see true
>>> iostats).
>>>
>>> I do have Cassandra data on a separate disk, commit log and cache are
>>> currently on the same disk as the system. I experimented with commit
>>>log
>>> flush modes and even with disabling commit log at all -- but it doesn't
>>> seem
>>> to have noticeable impact on the performance when under
>>> replication_factor=2.
>>>
>>>
>>> Any suggestions and hints will be much appreciated :) And please let me
>>> know
>>> if I need to share additional information about the configuration I'm
>>> running on.
>>>
>>> Best regards,
>>> Sergey
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> 
>>>http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counter
>>>s-replication-awful-performance-tp7583993.html
>>> Sent from the 
>
>> cassandra-user@.apache
>
>>  mailing list archive at
>>> Nabble.com.
>>>
>> 
>> 
>> 
>> -- 
>> 
>> Learn More:  SQI (Social Quality Index) - A Universal Measure of Social
>> Quality
>
>
>
>
>
>--
>View this message in context:
>http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-
>replication-awful-performance-tp7583993p7583996.html
>Sent from the cassandra-u...@incubator.apache.org mailing list archive at
>Nabble.com.


'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook

Re: counters + replication = awful performance?

Reply via email to