I have a loop that reads a counter, increments it by some integer, then goes 
off and does about 500ms of other work. After about 10 iterations of this loop, 
the counter value *sometimes* appears to be corrupted.

Looking at the logs, a sequence that just happened is:

Read counter - 15000
Increase counter by - 353
Read counter - 15353
Increase counter by - 1067
Read counter - 286079 (the new counter value is *very* different than what the 
increase should have produced, but usually, suspiciously, around 280k)
Increase counter by - 875
Read counter - 286079  (the counter stops changing at a certain point)


There is only 1 thread running this sequence, and consistency levels are set to 
ALL. The behavior is fairly repeatable - the unexpectation mutation will happen 
at least 10% of the time I run this program, but at different points. When it 
does not go awry, I can run this loop many thousands of times and keep the 
counter exact. But if it starts happening to a specific counter, the counter 
will never "recover" and will continue to maintain it's incorrect value even 
after successful subsequent writes.

I'm using the latest Astyanax driver on Cassandra 1.2.3 in a 3-node test 
cluster. It's also happened in development. Has anyone seem something like 
this? It feels almost too strange to be an actual bug but I'm stumped and have 
been looking at it too long :)

Thanks,
Josh

--
Josh Dzielak     
VP Engineering • Keen IO
Twitter • @dzello (https://twitter.com/dzello)
Mobile • 773-540-5264

Reply via email to