We are using MapReduce to periodical verify and rebuild our secondary indexes along with counting total records. We started to noticed double counting of unique keys on single machine standalone tests. We were finally able to reproduce the problem using the apache-cassandra-0.6.2-src/contrib/word_count example and just re-running it multiple times. We are hoping someone can verify the bug.
re-run the tests and the word count for /tmp/word_count3/part-r-00000 will be 1000 +~200 and will change if you blow the data away and re-run. Notice the setup script loops and only inserts 1000 records so we expect count to be 1000. Once the data is generated then re-running the setup script and/or mapreduce doesn't change the number (still off). The key is to blow all the data away and start over which will cause it to change. Can someone please verify this behavior? -Corey