Re: counters + replication = awful performance?

Sergey Olefir Tue, 27 Nov 2012 16:23:26 -0800

I already do a lot of in-memory aggregation before writing to Cassandra.

The question here is what is wrong with Cassandra (or its configuration)
that causes huge performance drop when moving from 1-replication to
2-replication for counters -- and more importantly how to resolve the
problem. 2x-3x drop when moving from 1-replication to 2-replication on two
nodes is reasonable. 6x is not. Like I said, with this kind of performance
degradation it makes more sense to run two clusters with replication=1 in
parallel rather than rely on Cassandra replication.


And yes, Rainbird was the inspiration for what we are trying to do here :)



Edward Capriolo wrote
> Cassandra's counters read on increment. Additionally they are distributed
> so that can be multiple reads on increment. If they are not fast enough
> and
> you have avoided all tuning options add more servers to handle the load.
> 
> In many cases incrementing the same counter n times can be avoided.
> 
> Twitter's rainbird did just that. It avoided multiple counter increments
> by
> batching them.
> 
> I have done a similar think using cassandra and Kafka.
> 
> https://github.com/edwardcapriolo/IronCount/blob/master/src/test/java/com/jointhegrid/ironcount/mockingbird/MockingBirdMessageHandler.java
> 
> 
> On Tuesday, November 27, 2012, Sergey Olefir &lt;

> solf.lists@

> &gt; wrote:
>> Hi, thanks for your suggestions.
>>
>> Regarding replicate=2 vs replicate=1 performance: I expected that below
>> configurations will have similar performance:
>> - single node, replicate = 1
>> - two nodes, replicate = 2 (okay, this probably should be a bit slower
>> due
>> to additional overhead).
>>
>> However what I'm seeing is that second option (replicate=2) is about
>> THREE
>> times slower than single node.
>>
>>
>> Regarding replicate_on_write -- it is, in fact, a dangerous option. As
> JIRA
>> discusses, if you make changes to your ring (moving tokens and such) you
>> will *silently* lose data. That is on top of whatever data you might end
> up
>> losing if you run replicate_on_write=false and the only node that got the
>> data fails.
>>
>> But what is much worse -- with replicate_on_write being false the data
> will
>> NOT be replicated (in my tests) ever unless you explicitly request the
> cell.
>> Then it will return the wrong result. And only on subsequent reads it
>> will
>> return adequate results. I haven't tested it, but documentation states
> that
>> range query will NOT do 'read repair' and thus will not force
>> replication.
>> The test I did went like this:
>> - replicate_on_write = false
>> - write something to node A (which should in theory replicate to node B)
>> - wait for a long time (longest was on the order of 5 hours)
>> - read from node B (and here I was getting null / wrong result)
>> - read from node B again (here you get what you'd expect after read
> repair)
>>
>> In essence, using replicate_on_write=false with rarely read data will
>> practically defeat the purpose of having replication in the first place
>> (failover, data redundancy).
>>
>>
>> Or, in other words, this option doesn't look to be applicable to my
>> situation.
>>
>> It looks like I will get much better performance by simply writing to two
>> separate clusters rather than using single cluster with replicate=2.
>> Which
>> is kind of stupid :) I think something's fishy with counters and
>> replication.
>>
>>
>>
>> Edward Capriolo wrote
>>> I mispoke really. It is not dangerous you just have to understand what
>>> it
>>> means. this jira discusses it.
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-3868
>>>
>>> On Tue, Nov 27, 2012 at 6:13 PM, Scott McKay &lt;
>>
>>> scottm@
>>
>>> &gt;wrote:
>>>
>>>>  We're having a similar performance problem.  Setting
>>>> 'replicate_on_write:
>>>> false' fixes the performance issue in our tests.
>>>>
>>>> How dangerous is it?  What exactly could go wrong?
>>>>
>>>> On 12-11-27 01:44 PM, Edward Capriolo wrote:
>>>>
>>>> The difference between Replication factor =1 and replication factor > 1
>>>> is
>>>> significant. Also it sounds like your cluster is 2 node so going from
>>>> RF=1
>>>> to RF=2 means double the load on both nodes.
>>>>
>>>>  You may want to experiment with the very dangerous column family
>>>> attribute:
>>>>
>>>>  - replicate_on_write: Replicate every counter update from the leader
>>>> to
>>>> the
>>>> follower replicas. Accepts the values true and false.
>>>>
>>>>  Edward
>>>>  On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman <
>>>>
>>
>>> mkjellman@
>>
>>>> wrote:
>>>>
>>>>> Are you writing with QUORUM consistency or ONE?
>>>>>
>>>>> On 11/27/12 9:52 AM, "Sergey Olefir" &lt;
>>
>>> solf.lists@
>>
>>> &gt; wrote:
>>>>>
>>>>> >Hi Juan,
>>>>> >
>>>>> >thanks for your input!
>>>>> >
>>>>> >In my case, however, I doubt this is the case -- clients are able to
>>>>> push
>>>>> >many more updates than I need to saturate replication_factor=2 case
>>>>> (e.g.
>>>>> >I'm doing as many as 6x more increments when testing 2-node cluster
>>>>> with
>>>>> >replication_factor=1), so bandwidth between clients and server should
>>>>> be
>>>>> >sufficient.
>>>>> >
>>>>> >Bandwidth between nodes in the cluster should also be quite
>>>>> sufficient
>>>>> >since
>>>>> >they are both in the same DC. But it is something to check, thanks!
>>>>> >
>>>>> >Best regards,
>>>>> >Sergey
>>>>> >
>>>>> >
>>>>> >Juan Valencia wrote
>>>>> >> Hi Sergey,
>>>>> >>
>>>>> >> I know I've had similar issues with counters which were
> bottle-necked
>>>>> by
>>>>> >> network throughput.  You might be seeing a problem with throughput
>>>>> >>between
>>>>> >> the clients and Cass or between the two Cass nodes.  It might not
>>>>> be
>>>>> >>your
>>>>> >> case, but that was what happened to me :-)
>>>>> >>
>>>>> >> Juan
>>>>> >>
>>>>> >>
>>>>> >> On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir &lt;
>>>>> >
>>>>> >> solf.lists@
>>>>> >
>>>>> >> &gt; wrote:
>>>>> >>
>>>>> >>> Hi,
>>>>> >>>
>>>>> >>> I have a serious problem with counters performance and I can't
>>>>> seem
>>>>> to
>>>>> >>> figure it out.
>>>>> >>>
>>>>> >>> Basically I'm building a system for accumulating some statistics
> "on
>>>>> >>>the
>>>>> >>> fly" via Cassandra distributed counters. For this I need counter
>>>>> >>>updates
>>>>> >>> to
>>>>> >>> work "really fast" and herein lies my problem -- as soon as I
> enable
>>>>> >>> replication_factor = 2, the performance goes down the drain. This
>>>>> >>>happens
>>>>> >>> in
>>>>> >>> my tests using both 1.0.x and 1.1.6.
>>>>> >>>
>>>>> >>> Let me elaborate:
>>>>> >>>
>>>>> >>> I have two boxes (virtual servers on top of physical servers
>>>>> rented
>>>>> >>> specifically for this purpose, i.e. it's not a cloud, nor it is
>>>>> shared;
>>>>> >>> virtual servers are managed by our admins as a way to limit damage
>>>>> as
>>>>> I
>>>>> >>> suppose :)). Cassandra partitioner is set to
>>>>> ByteOrderedPartitioner
>>>>> >>> because
>>>>> >>> I want to be able to do some range queries.
>>>>> >>>
>>>>> >>> First, I set up Cassandra individually on each box (not in a
>>>>> cluster)
>>>>> >>>and
>>>>> >>> test counter increments performance (exclusively increments, no
>>>>> reads).
>>>>> >>> For
>>>>> >>> tests I use code that is intended to somewhat resemble the
>>>>> expected
>>>>> >>>load
>>>>> >>> pattern -- particularly the majority of increments create new
>>>>> counters
>>>>> >>> with
>>>>> >>> some updating (adding) to already existing counters. In this test
>>>>> each
>>>>> >>> single node exhibits respectable performance - something on the
>>>>> order
>>>>> >>>of
>>>>> >>> 70k
>>>>> >>> (seventy thousand) increments per second.
>>>>> >>>
>>>>> >>> I then join both of these nodes into single cluster (using
>>>>> SimpleSnitch
>>>>> >>> and
>>>>> >>> SimpleStrategy, nothing fancy yet). I then run the same test using
>>>>> >>> replication_factor=1. The performance is on the order of 120k
>>>>> >>>increments
>>>>> >>> per
>>>>>>>> 'Like' us on Facebook for exclusive content and other resources on
> all
>>>>> Barracuda Networks solutions.
>>>>>
>>>>> Visit http://barracudanetworks.com/facebook
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> *Scott McKay*, Sr. Software Developer
>>>> MailChannels
>>>>
>>>> Tel: +1 604 685 7488 x 509
>>>> www.mailchannels.com
>>>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7584011.html
>> Sent from the 

> cassandra-user@.apache

>  mailing list archive at
> Nabble.com.
>>





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7584014.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: counters + replication = awful performance?

Reply via email to