Are you writing with QUORUM consistency or ONE? On 11/27/12 9:52 AM, "Sergey Olefir" <solf.li...@gmail.com> wrote:
>Hi Juan, > >thanks for your input! > >In my case, however, I doubt this is the case -- clients are able to push >many more updates than I need to saturate replication_factor=2 case (e.g. >I'm doing as many as 6x more increments when testing 2-node cluster with >replication_factor=1), so bandwidth between clients and server should be >sufficient. > >Bandwidth between nodes in the cluster should also be quite sufficient >since >they are both in the same DC. But it is something to check, thanks! > >Best regards, >Sergey > > >Juan Valencia wrote >> Hi Sergey, >> >> I know I've had similar issues with counters which were bottle-necked by >> network throughput. You might be seeing a problem with throughput >>between >> the clients and Cass or between the two Cass nodes. It might not be >>your >> case, but that was what happened to me :-) >> >> Juan >> >> >> On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir < > >> solf.lists@ > >> > wrote: >> >>> Hi, >>> >>> I have a serious problem with counters performance and I can't seem to >>> figure it out. >>> >>> Basically I'm building a system for accumulating some statistics "on >>>the >>> fly" via Cassandra distributed counters. For this I need counter >>>updates >>> to >>> work "really fast" and herein lies my problem -- as soon as I enable >>> replication_factor = 2, the performance goes down the drain. This >>>happens >>> in >>> my tests using both 1.0.x and 1.1.6. >>> >>> Let me elaborate: >>> >>> I have two boxes (virtual servers on top of physical servers rented >>> specifically for this purpose, i.e. it's not a cloud, nor it is shared; >>> virtual servers are managed by our admins as a way to limit damage as I >>> suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner >>> because >>> I want to be able to do some range queries. >>> >>> First, I set up Cassandra individually on each box (not in a cluster) >>>and >>> test counter increments performance (exclusively increments, no reads). >>> For >>> tests I use code that is intended to somewhat resemble the expected >>>load >>> pattern -- particularly the majority of increments create new counters >>> with >>> some updating (adding) to already existing counters. In this test each >>> single node exhibits respectable performance - something on the order >>>of >>> 70k >>> (seventy thousand) increments per second. >>> >>> I then join both of these nodes into single cluster (using SimpleSnitch >>> and >>> SimpleStrategy, nothing fancy yet). I then run the same test using >>> replication_factor=1. The performance is on the order of 120k >>>increments >>> per >>> second -- which seems to be a reasonable increase over the single node >>> performance. >>> >>> >>> HOWEVER I then rerun the same test on the two-node cluster using >>> replication_factor=2 -- which is the least I'll need for actual >>> production >>> for redundancy purposes. And the performance I get is absolutely >>>horrible >>> -- >>> much, MUCH worse than even single-node performance -- something on the >>> order >>> of less than 25k increments per second. In addition to clients not >>>being >>> able to push updates fast enough, I also see a lot of 'messages >>>dropped' >>> messages in the Cassandra log under this load. >>> >>> Could anyone advise what could be causing such drastic performance drop >>> under replication_factor=2? I was expecting something on the order of >>> single-node performance, not approximately 3x less. >>> >>> >>> When testing replication_factor=2 on 1.1.6 I can see that CPU usage >>>goes >>> through the roof. On 1.0.x I think it looked more like disk overload, >>>but >>> I'm not sure (being on virtual server I apparently can't see true >>> iostats). >>> >>> I do have Cassandra data on a separate disk, commit log and cache are >>> currently on the same disk as the system. I experimented with commit >>>log >>> flush modes and even with disabling commit log at all -- but it doesn't >>> seem >>> to have noticeable impact on the performance when under >>> replication_factor=2. >>> >>> >>> Any suggestions and hints will be much appreciated :) And please let me >>> know >>> if I need to share additional information about the configuration I'm >>> running on. >>> >>> Best regards, >>> Sergey >>> >>> >>> >>> -- >>> View this message in context: >>> >>>http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counter >>>s-replication-awful-performance-tp7583993.html >>> Sent from the > >> cassandra-user@.apache > >> mailing list archive at >>> Nabble.com. >>> >> >> >> >> -- >> >> Learn More: SQI (Social Quality Index) - A Universal Measure of Social >> Quality > > > > > >-- >View this message in context: >http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters- >replication-awful-performance-tp7583993p7583996.html >Sent from the cassandra-u...@incubator.apache.org mailing list archive at >Nabble.com. 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook