> I don't understand what the three in parentheses values are exactly. I guess > the last number is the count and the middle one is the number of increments, > is that true ? What is the first string (identical in all the errors) ?
It's (UUID, clock, increment). Very briefly, counter columns in Cassandra are made up of multiple "shards". In the write path, a particular counter increment is executed by one "leader" which is one of the replicas of the counter. The leader will increment it's own value, read it's own full value (this is why "Replicate On Write" has to do reads in the write path for counters) and replicas to other nodes. UUID "roughly" corresponds to a node in the cluster (UUID:s are sometimes refreshed, so it's not a strict correlation). Clockid is supposed to be monotonically increasing for a given UUID. > How can the last number (assuming it's the count) be negative knowing that I > only sum positive numbers ? I don't see a negative number in you paste? > An other point is that the highest value seems to be *always* the good one > (assuming this time that the middle number is the number of increments). DISCLAIMER: This is me responding off the cuff without digging into it further. Depends on the source of the problem. If the problem, as theorized in the ticket, is caused by non-clean shutdown of nodes the expected result *should* be that we effectively "loose" counter increments. Given a particular leader among the replicas, suppose you increment counter C by N1, followed by un-clean shutdown with the value never having been written to the commit log. On the next increment of C by N2, a counter shard would be generated which has the value being base+N2 instead of base+N1 (assuming the memtable wasn't flushed and no other writes to the same counter column happened). When this gets replicated to other nodes, they would see a value based on N1 and a value based on N2, both with the same clock. It would choose the higher one. In either case as far as I can tell (off the top of my head), *some* counter increment is lost. The only way I can see (again off the top of my head) the resulting value being correct is if the later increment (N2 in this case) is somehow including N1 as well (e.g., because it was generated by first reading the current counter value). -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)