"blow all the data away" ... how do you do that? What is the timestamp precision that you are using when creating key/col or key/supercol/col items?
I have seen a fail to write a key when the timestamp is identical to the previous timestamp of a deleted key/col. While I didn't examine the source code, I'm certain that this is do to delete tombstones. I view this as a application error because I was attempting to do this within the GCGraceSeconds time period. If I, however, stopped cassandra, blew away data & commitlogs and restarted the write always succeeds (no surprise there). I turned this behavior into a feature (of sorts). When this happens I increment a formally non-zero portion of the timestamp (the last digit of precision which was always zero) and use this as a counter to track how many times a key/col was updated (max 9 for my purposes). -phil On Jun 18, 2010, at 5:49 PM, Corey Hulen wrote: > > We are using MapReduce to periodical verify and rebuild our secondary indexes > along with counting total records. We started to noticed double counting of > unique keys on single machine standalone tests. We were finally able to > reproduce the problem using the apache-cassandra-0.6.2-src/contrib/word_count > example and just re-running it multiple times. We are hoping someone can > verify the bug. > > re-run the tests and the word count for /tmp/word_count3/part-r-00000 will be > 1000 +~200 and will change if you blow the data away and re-run. Notice the > setup script loops and only inserts 1000 records so we expect count to be > 1000. Once the data is generated then re-running the setup script and/or > mapreduce doesn't change the number (still off). The key is to blow all the > data away and start over which will cause it to change. > > Can someone please verify this behavior? > > -Corey