I'm running a large hadoop job in which merge-with is called millions of times to aggregate values among about 1000 keys. Basically we are counting the number of times the keys occur among all entries and using merge-with as the reduce function.
In the output, the keys are often duplicated (I've attached a sample of the result map below so you can see what I mean, the keys 14336, 14464, 14528 have dups), with the repeat having a value of 1 or sometimes 2. This is coming right out of merge-with, so its baffling why there would be duplicate keys, unless there is some kind of bug in that function when called in a strenuous setting. Another theory is something going wrong during serialization in intermediate steps. I am using cascading-clojure. Thanks for your input. {14336 18176, 14336 2, 14368 19111, 14400 17161, 14432 17015, 14464 14604, 14464 1, 14496 20810, 14528 18759, 14528 1, 14560 28086, 13568 162, 14592 31956, 13600 437, 14624 38402, 13632 281, 14656 38429, 13664 1137, 14688 29531} -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en