I already do a lot of in-memory aggregation before writing to Cassandra. The question here is what is wrong with Cassandra (or its configuration) that causes huge performance drop when moving from 1-replication to 2-replication for counters -- and more importantly how to resolve the problem. 2x-3x drop when moving from 1-replication to 2-replication on two nodes is reasonable. 6x is not. Like I said, with this kind of performance degradation it makes more sense to run two clusters with replication=1 in parallel rather than rely on Cassandra replication.
And yes, Rainbird was the inspiration for what we are trying to do here :) Edward Capriolo wrote > Cassandra's counters read on increment. Additionally they are distributed > so that can be multiple reads on increment. If they are not fast enough > and > you have avoided all tuning options add more servers to handle the load. > > In many cases incrementing the same counter n times can be avoided. > > Twitter's rainbird did just that. It avoided multiple counter increments > by > batching them. > > I have done a similar think using cassandra and Kafka. > > https://github.com/edwardcapriolo/IronCount/blob/master/src/test/java/com/jointhegrid/ironcount/mockingbird/MockingBirdMessageHandler.java > > > On Tuesday, November 27, 2012, Sergey Olefir < > solf.lists@ > > wrote: >> Hi, thanks for your suggestions. >> >> Regarding replicate=2 vs replicate=1 performance: I expected that below >> configurations will have similar performance: >> - single node, replicate = 1 >> - two nodes, replicate = 2 (okay, this probably should be a bit slower >> due >> to additional overhead). >> >> However what I'm seeing is that second option (replicate=2) is about >> THREE >> times slower than single node. >> >> >> Regarding replicate_on_write -- it is, in fact, a dangerous option. As > JIRA >> discusses, if you make changes to your ring (moving tokens and such) you >> will *silently* lose data. That is on top of whatever data you might end > up >> losing if you run replicate_on_write=false and the only node that got the >> data fails. >> >> But what is much worse -- with replicate_on_write being false the data > will >> NOT be replicated (in my tests) ever unless you explicitly request the > cell. >> Then it will return the wrong result. And only on subsequent reads it >> will >> return adequate results. I haven't tested it, but documentation states > that >> range query will NOT do 'read repair' and thus will not force >> replication. >> The test I did went like this: >> - replicate_on_write = false >> - write something to node A (which should in theory replicate to node B) >> - wait for a long time (longest was on the order of 5 hours) >> - read from node B (and here I was getting null / wrong result) >> - read from node B again (here you get what you'd expect after read > repair) >> >> In essence, using replicate_on_write=false with rarely read data will >> practically defeat the purpose of having replication in the first place >> (failover, data redundancy). >> >> >> Or, in other words, this option doesn't look to be applicable to my >> situation. >> >> It looks like I will get much better performance by simply writing to two >> separate clusters rather than using single cluster with replicate=2. >> Which >> is kind of stupid :) I think something's fishy with counters and >> replication. >> >> >> >> Edward Capriolo wrote >>> I mispoke really. It is not dangerous you just have to understand what >>> it >>> means. this jira discusses it. >>> >>> https://issues.apache.org/jira/browse/CASSANDRA-3868 >>> >>> On Tue, Nov 27, 2012 at 6:13 PM, Scott McKay < >> >>> scottm@ >> >>> >wrote: >>> >>>> We're having a similar performance problem. Setting >>>> 'replicate_on_write: >>>> false' fixes the performance issue in our tests. >>>> >>>> How dangerous is it? What exactly could go wrong? >>>> >>>> On 12-11-27 01:44 PM, Edward Capriolo wrote: >>>> >>>> The difference between Replication factor =1 and replication factor > 1 >>>> is >>>> significant. Also it sounds like your cluster is 2 node so going from >>>> RF=1 >>>> to RF=2 means double the load on both nodes. >>>> >>>> You may want to experiment with the very dangerous column family >>>> attribute: >>>> >>>> - replicate_on_write: Replicate every counter update from the leader >>>> to >>>> the >>>> follower replicas. Accepts the values true and false. >>>> >>>> Edward >>>> On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman < >>>> >> >>> mkjellman@ >> >>>> wrote: >>>> >>>>> Are you writing with QUORUM consistency or ONE? >>>>> >>>>> On 11/27/12 9:52 AM, "Sergey Olefir" < >> >>> solf.lists@ >> >>> > wrote: >>>>> >>>>> >Hi Juan, >>>>> > >>>>> >thanks for your input! >>>>> > >>>>> >In my case, however, I doubt this is the case -- clients are able to >>>>> push >>>>> >many more updates than I need to saturate replication_factor=2 case >>>>> (e.g. >>>>> >I'm doing as many as 6x more increments when testing 2-node cluster >>>>> with >>>>> >replication_factor=1), so bandwidth between clients and server should >>>>> be >>>>> >sufficient. >>>>> > >>>>> >Bandwidth between nodes in the cluster should also be quite >>>>> sufficient >>>>> >since >>>>> >they are both in the same DC. But it is something to check, thanks! >>>>> > >>>>> >Best regards, >>>>> >Sergey >>>>> > >>>>> > >>>>> >Juan Valencia wrote >>>>> >> Hi Sergey, >>>>> >> >>>>> >> I know I've had similar issues with counters which were > bottle-necked >>>>> by >>>>> >> network throughput. You might be seeing a problem with throughput >>>>> >>between >>>>> >> the clients and Cass or between the two Cass nodes. It might not >>>>> be >>>>> >>your >>>>> >> case, but that was what happened to me :-) >>>>> >> >>>>> >> Juan >>>>> >> >>>>> >> >>>>> >> On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir < >>>>> > >>>>> >> solf.lists@ >>>>> > >>>>> >> > wrote: >>>>> >> >>>>> >>> Hi, >>>>> >>> >>>>> >>> I have a serious problem with counters performance and I can't >>>>> seem >>>>> to >>>>> >>> figure it out. >>>>> >>> >>>>> >>> Basically I'm building a system for accumulating some statistics > "on >>>>> >>>the >>>>> >>> fly" via Cassandra distributed counters. For this I need counter >>>>> >>>updates >>>>> >>> to >>>>> >>> work "really fast" and herein lies my problem -- as soon as I > enable >>>>> >>> replication_factor = 2, the performance goes down the drain. This >>>>> >>>happens >>>>> >>> in >>>>> >>> my tests using both 1.0.x and 1.1.6. >>>>> >>> >>>>> >>> Let me elaborate: >>>>> >>> >>>>> >>> I have two boxes (virtual servers on top of physical servers >>>>> rented >>>>> >>> specifically for this purpose, i.e. it's not a cloud, nor it is >>>>> shared; >>>>> >>> virtual servers are managed by our admins as a way to limit damage >>>>> as >>>>> I >>>>> >>> suppose :)). Cassandra partitioner is set to >>>>> ByteOrderedPartitioner >>>>> >>> because >>>>> >>> I want to be able to do some range queries. >>>>> >>> >>>>> >>> First, I set up Cassandra individually on each box (not in a >>>>> cluster) >>>>> >>>and >>>>> >>> test counter increments performance (exclusively increments, no >>>>> reads). >>>>> >>> For >>>>> >>> tests I use code that is intended to somewhat resemble the >>>>> expected >>>>> >>>load >>>>> >>> pattern -- particularly the majority of increments create new >>>>> counters >>>>> >>> with >>>>> >>> some updating (adding) to already existing counters. In this test >>>>> each >>>>> >>> single node exhibits respectable performance - something on the >>>>> order >>>>> >>>of >>>>> >>> 70k >>>>> >>> (seventy thousand) increments per second. >>>>> >>> >>>>> >>> I then join both of these nodes into single cluster (using >>>>> SimpleSnitch >>>>> >>> and >>>>> >>> SimpleStrategy, nothing fancy yet). I then run the same test using >>>>> >>> replication_factor=1. The performance is on the order of 120k >>>>> >>>increments >>>>> >>> per >>>>>>>> 'Like' us on Facebook for exclusive content and other resources on > all >>>>> Barracuda Networks solutions. >>>>> >>>>> Visit http://barracudanetworks.com/facebook >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> *Scott McKay*, Sr. Software Developer >>>> MailChannels >>>> >>>> Tel: +1 604 685 7488 x 509 >>>> www.mailchannels.com >>>> >> >> >> >> >> >> -- >> View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7584011.html >> Sent from the > cassandra-user@.apache > mailing list archive at > Nabble.com. >> -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7584014.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.