Hi Sergey, I know I've had similar issues with counters which were bottle-necked by network throughput. You might be seeing a problem with throughput between the clients and Cass or between the two Cass nodes. It might not be your case, but that was what happened to me :-)
Juan On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir <solf.li...@gmail.com> wrote: > Hi, > > I have a serious problem with counters performance and I can't seem to > figure it out. > > Basically I'm building a system for accumulating some statistics "on the > fly" via Cassandra distributed counters. For this I need counter updates to > work "really fast" and herein lies my problem -- as soon as I enable > replication_factor = 2, the performance goes down the drain. This happens > in > my tests using both 1.0.x and 1.1.6. > > Let me elaborate: > > I have two boxes (virtual servers on top of physical servers rented > specifically for this purpose, i.e. it's not a cloud, nor it is shared; > virtual servers are managed by our admins as a way to limit damage as I > suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner because > I want to be able to do some range queries. > > First, I set up Cassandra individually on each box (not in a cluster) and > test counter increments performance (exclusively increments, no reads). For > tests I use code that is intended to somewhat resemble the expected load > pattern -- particularly the majority of increments create new counters with > some updating (adding) to already existing counters. In this test each > single node exhibits respectable performance - something on the order of > 70k > (seventy thousand) increments per second. > > I then join both of these nodes into single cluster (using SimpleSnitch and > SimpleStrategy, nothing fancy yet). I then run the same test using > replication_factor=1. The performance is on the order of 120k increments > per > second -- which seems to be a reasonable increase over the single node > performance. > > > HOWEVER I then rerun the same test on the two-node cluster using > replication_factor=2 -- which is the least I'll need for actual production > for redundancy purposes. And the performance I get is absolutely horrible > -- > much, MUCH worse than even single-node performance -- something on the > order > of less than 25k increments per second. In addition to clients not being > able to push updates fast enough, I also see a lot of 'messages dropped' > messages in the Cassandra log under this load. > > Could anyone advise what could be causing such drastic performance drop > under replication_factor=2? I was expecting something on the order of > single-node performance, not approximately 3x less. > > > When testing replication_factor=2 on 1.1.6 I can see that CPU usage goes > through the roof. On 1.0.x I think it looked more like disk overload, but > I'm not sure (being on virtual server I apparently can't see true iostats). > > I do have Cassandra data on a separate disk, commit log and cache are > currently on the same disk as the system. I experimented with commit log > flush modes and even with disabling commit log at all -- but it doesn't > seem > to have noticeable impact on the performance when under > replication_factor=2. > > > Any suggestions and hints will be much appreciated :) And please let me > know > if I need to share additional information about the configuration I'm > running on. > > Best regards, > Sergey > > > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. > -- Learn More: SQI (Social Quality Index) - A Universal Measure of Social Quality