@Robin I'm pretty sure the GC issue is due to counters only. Since we have only write-heavy counter incrementing traffic. GC Frequency also increases linearly with write load.
@Bartlomiej On Stress Testing, we see GC frequency and consequently write latency increase to several milliseconds. At 50k qps we had GC running every 1-2 second. And since each Parnew takes around 100ms, we were spending 10% of each server's time GCing. Also, we don't have persistent connections, but testing with persistent connections give roughly the same result. At a traffic of roughly 20k qps for 8 nodes with RF 2, we have Young Gen GC running on each node every 4 seconds (approximately). We have a young gen heap size of 3200M which is already too big by any standards. Also decreasing Replication factor from 2 to 1 reduced the GC frequency 5-6 times. Any Advice? Also, our traffic is evenly On Tue, Sep 18, 2012 at 1:36 PM, Robin Verlangen <ro...@us2.nl> wrote: > We've not been trying to create inconsistencies as you describe above. But > it seems legit that those situations cause problems. > > Sometimes you can see log messages that indicate that counters are out of > sync in the cluster and they get "repaired". My guess would be that the > repairs actually destroys it, however I have no knowledge of the underlying > techniques. I think this because of the fact that those read repairs happen > a lot (as you mention: lots of reads) and might get over-repaired or > something? However, this is all just a guess. I hope someone with a lot > knowledge about Cassandra internals can shed some light on this. > > Best regards, > > Robin Verlangen > Software engineer > > W http://www.robinverlangen.nl > E ro...@us2.nl > > Disclaimer: The information contained in this message and attachments is > intended solely for the attention and use of the named addressee and may be > confidential. If you are not the intended recipient, you are reminded that > the information remains the property of the sender. You must not use, > disclose, distribute, copy, print or rely on this e-mail. If you have > received this message in error, please contact the sender immediately and > irrevocably delete this message and any copies. > > > > 2012/9/18 Bartłomiej Romański <b...@sentia.pl> >> >> Garbage is one more issue we are having with counters. We are >> operating under very heavy load. Counters are spread over 7 nodes with >> SSD drives and we often seeing CPU usage between 90-100%. We are doing >> mostly reads. Latency is very important for us so GC pauses taking >> longer than 10ms (often around 50-100ms) are very annoying. >> >> I don't have actual numbers right now, but we've also got the >> impressions that cassandra generates "too much" garbage. Is there a >> possible that counters are somehow guilty? >> >> @Rohit: Did you tried something more stressful? Like sending more >> traffic to a node that it can actually handle, turning nodes up and >> down, changing the topology (moving/adding nodes)? I believe our >> problems comes from very high load and some operations like this >> (adding new nodes, replacing dead ones etc...). I was expecting that >> cassandra will fail some request, loose consistency temporarily or >> something like that in such cases, but generation highly incorrect >> values was very disappointing. >> >> Thanks, >> Bartek >> >> >> On Tue, Sep 18, 2012 at 9:30 AM, Robin Verlangen <ro...@us2.nl> wrote: >> > @Rohit: We also use counters quite a lot (lets say 2000 increments / >> > sec), >> > but don't see the 50-100KB of garbage per increment. Are you sure that >> > memory is coming from your counters? >> > >> > Best regards, >> > >> > Robin Verlangen >> > Software engineer >> > >> > W http://www.robinverlangen.nl >> > E ro...@us2.nl >> > >> > Disclaimer: The information contained in this message and attachments is >> > intended solely for the attention and use of the named addressee and may >> > be >> > confidential. If you are not the intended recipient, you are reminded >> > that >> > the information remains the property of the sender. You must not use, >> > disclose, distribute, copy, print or rely on this e-mail. If you have >> > received this message in error, please contact the sender immediately >> > and >> > irrevocably delete this message and any copies. >> > >> > >> > >> > 2012/9/18 rohit bhatia <rohit2...@gmail.com> >> >> >> >> We use counters in a 8 node cluster with RF 2 in cassandra 1.0.5. >> >> We use phpcassa and execute cql queries through thrift to work with >> >> composite types. >> >> >> >> We do not have any problem of overcounts as we tally with RDBMS daily. >> >> >> >> It works fine but we are having some GC pressure for young generation. >> >> Per my calculation around 50-100 KB of garbage is generated every >> >> counter increment. >> >> Is this memory usage expected of counters? >> >> >> >> On Tue, Sep 18, 2012 at 7:16 AM, Bartłomiej Romański <b...@sentia.pl> >> >> wrote: >> >> > Hi, >> >> > >> >> > Does anyone have any experience with using Cassandra counters in >> >> > production? >> >> > >> >> > We rely heavily on them and recently we've got a few very serious >> >> > problems. Our counters values suddenly became a few times higher than >> >> > expected. From the business point of view this is a disaster :/ Also >> >> > there a few open major bugs related to them. Some of them for quite >> >> > long (months). >> >> > >> >> > We are seriously considering going back to other solutions (e.g. SQL >> >> > databases). We simply cannot afford incorrect counter values. We can >> >> > tolerate loosing a few increments from time to time, but we cannot >> >> > tolerate having counters suddenly 3 times higher or lower than the >> >> > expected values. >> >> > >> >> > What is the current status of counters? Should I consider them a >> >> > production-ready feature and we just have some bad luck? Or should I >> >> > rather consider them as a experimental-feature and look for some >> >> > other >> >> > solutions? >> >> > >> >> > Do you have any experiences with them? Any comments would be very >> >> > helpful for us! >> >> > >> >> > Thanks, >> >> > Bartek >> > >> > > >