On Sun, Jul 24, 2011 at 3:36 PM, aaron morton <aa...@thelastpickle.com> wrote:
> What's your use case ? There are people out there having good times with 
> counters, see
>
> http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
> http://www.scribd.com/doc/59830692/Cassandra-at-Twitter

It's actually pretty similar to Twitter's click counting, but
apparently we have different requirements for accuracy.  It's possible
Rainbird does something on the front end to solve for this issue- I'm
honestly not sure since they haven't released the code yet.

Anyways, when you're building network aggregate graphs and fail to add
the +100G of traffic from one switch to your site or metro aggregate,
people around here notice.  And people quickly start distrusting
graphs which don't look "real" and either ignore them completely or
complain.

Obviously, one should manage their Cassandra cluster to limit the
occurrence of Timeouts, but frankly I don't want to be paged at 2am to
"fix" these kind of problems.  If I knew "timeout" meant "failed to
increment counter", I could spool my changes on the client and try
again later, but that's not what timeout means.  Without any means to
recover I've actually lost a lot of reliability that I currently have
with my single PostgreSQL server backed data store.

Right now I'm trying to come up with a way that my distributed snmp
pollers can build aggregates efficiently without counters, but that's
going to add a lot of complexity. :(

-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

Reply via email to