On Mon, Jul 25, 2011 at 11:24 AM, Sylvain Lebresne <sylv...@datastax.com> wrote: > On Mon, Jul 25, 2011 at 7:35 PM, Aaron Turner <synfina...@gmail.com> wrote: >> On Sun, Jul 24, 2011 at 3:36 PM, aaron morton <aa...@thelastpickle.com> >> wrote: >>> What's your use case ? There are people out there having good times with >>> counters, see >>> >>> http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011 >>> http://www.scribd.com/doc/59830692/Cassandra-at-Twitter >> >> It's actually pretty similar to Twitter's click counting, but >> apparently we have different requirements for accuracy. It's possible >> Rainbird does something on the front end to solve for this issue- I'm >> honestly not sure since they haven't released the code yet. >> >> Anyways, when you're building network aggregate graphs and fail to add >> the +100G of traffic from one switch to your site or metro aggregate, >> people around here notice. And people quickly start distrusting >> graphs which don't look "real" and either ignore them completely or >> complain. >> >> Obviously, one should manage their Cassandra cluster to limit the >> occurrence of Timeouts, but frankly I don't want to be paged at 2am to >> "fix" these kind of problems. If I knew "timeout" meant "failed to >> increment counter", I could spool my changes on the client and try >> again later, but that's not what timeout means. Without any means to >> recover I've actually lost a lot of reliability that I currently have >> with my single PostgreSQL server backed data store. > > Just to make it very clear: *nobody* is arguing this is not a limitation. > > The thing is some find counters useful even while perfectly aware of > that limitation and seems to be very productive with it, so we have > added them. Truth is, if you can live with the limitations and manage > the timeout to a bare minimum (hopefully 0), then you won't find much > system that are able to scale counting both in term of number of > counters and number of ops/s on each counter, and that across > datacenters, like Cassandra counters does. And let's recall that > while you don't know what happened on a timeout, you at least know > when those happens, so you can compute the error margin. > > Again, this does not mean we don't want to fix the limitations, nor > that we want you to wake up at 2am, and there is actually a ticket > open for that: > https://issues.apache.org/jira/browse/CASSANDRA-2495 > The problem is, so far, we haven't found any satisfying solution to > that problem. If someone has a solution, please, please, share! > > But yes, counters in their current state don't fit everyone needs > and we certainly don't want to hide it.
I think the Cassandra community has been pretty open about the limitations and I can see there are some uses for them in their current state. Probably my biggest concern is that I'm pretty new to Cassandra and don't understand why occasionally I see timeouts even under very low load (one single threaded client). Once I understood the impacts wrt counters it went from "annoying" to "oh crap". Anyways, as I said earlier, I understand this problem is "hard" and I don't expect a fix in 0.8.2 :) Mostly right now I'm just bummed because I'm pretty much back at square one trying to create a scalable solution which meets our needs. Not to say Cassandra won't be a part of it, but just that the solution design has become a lot less obvious. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"