We are using MapReduce to periodical verify and rebuild our secondary
indexes along with counting total records.  We started to noticed double
counting of unique keys on single machine standalone tests. We were finally
able to reproduce the problem using
the apache-cassandra-0.6.2-src/contrib/word_count example and just
re-running it multiple times.  We are hoping someone can verify the bug.

re-run the tests and the word count for /tmp/word_count3/part-r-00000 will
be 1000 +~200  and will change if you blow the data away and re-run.  Notice
the setup script loops and only inserts 1000 records so we expect count to
be 1000.  Once the data is generated then re-running the setup script and/or
mapreduce doesn't change the number (still off).  The key is to blow all the
data away and start over which will cause it to change.

Can someone please verify this behavior?

-Corey

Reply via email to