Re: counters + replication = awful performance?

Scott McKay Tue, 27 Nov 2012 15:14:09 -0800

We're having a similar performance problem.  Setting
'replicate_on_write: false' fixes the performance issue in our tests.


How dangerous is it?  What exactly could go wrong?

On 12-11-27 01:44 PM, Edward Capriolo wrote:
> The difference between Replication factor =1 and replication factor >
> 1 is significant. Also it sounds like your cluster is 2 node so going
> from RF=1 to RF=2 means double the load on both nodes.
>
> You may want to experiment with the very dangerous column family
> attribute:
>
> - replicate_on_write: Replicate every counter update from the leader
> to the
> follower replicas. Accepts the values true and false.
>
> Edward
> On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman
> <mkjell...@barracuda.com <mailto:mkjell...@barracuda.com>> wrote:
>
>     Are you writing with QUORUM consistency or ONE?
>
>     On 11/27/12 9:52 AM, "Sergey Olefir" <solf.li...@gmail.com
>     <mailto:solf.li...@gmail.com>> wrote:
>
>     >Hi Juan,
>     >
>     >thanks for your input!
>     >
>     >In my case, however, I doubt this is the case -- clients are able
>     to push
>     >many more updates than I need to saturate replication_factor=2
>     case (e.g.
>     >I'm doing as many as 6x more increments when testing 2-node
>     cluster with
>     >replication_factor=1), so bandwidth between clients and server
>     should be
>     >sufficient.
>     >
>     >Bandwidth between nodes in the cluster should also be quite
>     sufficient
>     >since
>     >they are both in the same DC. But it is something to check, thanks!
>     >
>     >Best regards,
>     >Sergey
>     >
>     >
>     >Juan Valencia wrote
>     >> Hi Sergey,
>     >>
>     >> I know I've had similar issues with counters which were
>     bottle-necked by
>     >> network throughput.  You might be seeing a problem with throughput
>     >>between
>     >> the clients and Cass or between the two Cass nodes.  It might
>     not be
>     >>your
>     >> case, but that was what happened to me :-)
>     >>
>     >> Juan
>     >>
>     >>
>     >> On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir &lt;
>     >
>     >> solf.lists@
>     >
>     >> &gt; wrote:
>     >>
>     >>> Hi,
>     >>>
>     >>> I have a serious problem with counters performance and I can't
>     seem to
>     >>> figure it out.
>     >>>
>     >>> Basically I'm building a system for accumulating some
>     statistics "on
>     >>>the
>     >>> fly" via Cassandra distributed counters. For this I need counter
>     >>>updates
>     >>> to
>     >>> work "really fast" and herein lies my problem -- as soon as I
>     enable
>     >>> replication_factor = 2, the performance goes down the drain. This
>     >>>happens
>     >>> in
>     >>> my tests using both 1.0.x and 1.1.6.
>     >>>
>     >>> Let me elaborate:
>     >>>
>     >>> I have two boxes (virtual servers on top of physical servers
>     rented
>     >>> specifically for this purpose, i.e. it's not a cloud, nor it
>     is shared;
>     >>> virtual servers are managed by our admins as a way to limit
>     damage as I
>     >>> suppose :)). Cassandra partitioner is set to
>     ByteOrderedPartitioner
>     >>> because
>     >>> I want to be able to do some range queries.
>     >>>
>     >>> First, I set up Cassandra individually on each box (not in a
>     cluster)
>     >>>and
>     >>> test counter increments performance (exclusively increments,
>     no reads).
>     >>> For
>     >>> tests I use code that is intended to somewhat resemble the
>     expected
>     >>>load
>     >>> pattern -- particularly the majority of increments create new
>     counters
>     >>> with
>     >>> some updating (adding) to already existing counters. In this
>     test each
>     >>> single node exhibits respectable performance - something on
>     the order
>     >>>of
>     >>> 70k
>     >>> (seventy thousand) increments per second.
>     >>>
>     >>> I then join both of these nodes into single cluster (using
>     SimpleSnitch
>     >>> and
>     >>> SimpleStrategy, nothing fancy yet). I then run the same test using
>     >>> replication_factor=1. The performance is on the order of 120k
>     >>>increments
>     >>> per
>     >>> second -- which seems to be a reasonable increase over the
>     single node
>     >>> performance.
>     >>>
>     >>>
>     >>> HOWEVER I then rerun the same test on the two-node cluster using
>     >>> replication_factor=2 -- which is the least I'll need for actual
>     >>> production
>     >>> for redundancy purposes. And the performance I get is absolutely
>     >>>horrible
>     >>> --
>     >>> much, MUCH worse than even single-node performance --
>     something on the
>     >>> order
>     >>> of less than 25k increments per second. In addition to clients not
>     >>>being
>     >>> able to push updates fast enough, I also see a lot of 'messages
>     >>>dropped'
>     >>> messages in the Cassandra log under this load.
>     >>>
>     >>> Could anyone advise what could be causing such drastic
>     performance drop
>     >>> under replication_factor=2? I was expecting something on the
>     order of
>     >>> single-node performance, not approximately 3x less.
>     >>>
>     >>>
>     >>> When testing replication_factor=2 on 1.1.6 I can see that CPU
>     usage
>     >>>goes
>     >>> through the roof. On 1.0.x I think it looked more like disk
>     overload,
>     >>>but
>     >>> I'm not sure (being on virtual server I apparently can't see true
>     >>> iostats).
>     >>>
>     >>> I do have Cassandra data on a separate disk, commit log and
>     cache are
>     >>> currently on the same disk as the system. I experimented with
>     commit
>     >>>log
>     >>> flush modes and even with disabling commit log at all -- but
>     it doesn't
>     >>> seem
>     >>> to have noticeable impact on the performance when under
>     >>> replication_factor=2.
>     >>>
>     >>>
>     >>> Any suggestions and hints will be much appreciated :) And
>     please let me
>     >>> know
>     >>> if I need to share additional information about the
>     configuration I'm
>     >>> running on.
>     >>>
>     >>> Best regards,
>     >>> Sergey
>     >>>
>     >>>
>     >>>
>     >>> --
>     >>> View this message in context:
>     >>>
>     
> >>>http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counter
>     >>>s-replication-awful-performance-tp7583993.html
>     >>> Sent from the
>     >
>     >> cassandra-user@.apache
>     >
>     >>  mailing list archive at
>     >>> Nabble.com.
>     >>>
>     >>
>     >>
>     >>
>     >> --
>     >>
>     >> Learn More:  SQI (Social Quality Index) - A Universal Measure
>     of Social
>     >> Quality
>     >
>     >
>     >
>     >
>     >
>     >--
>     >View this message in context:
>     
> >http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-
>     >replication-awful-performance-tp7583993p7583996.html
>     >Sent from the cassandra-u...@incubator.apache.org
>     <mailto:cassandra-u...@incubator.apache.org> mailing list archive at
>     >Nabble.com.
>
>
>     'Like' us on Facebook for exclusive content and other resources on
>     all Barracuda Networks solutions.
>
>     Visit http://barracudanetworks.com/facebook
>
>
>
>
>

-- 
*Scott McKay*, Sr. Software Developer
MailChannels

Tel: +1 604 685 7488 x 509
www.mailchannels.com

Re: counters + replication = awful performance?

Reply via email to