I'm running a large hadoop job in which merge-with is called millions
of times to aggregate values among about 1000 keys. Basically we are
counting the number of times the keys occur among all entries and
using merge-with as the reduce function.

In the output, the keys are often duplicated (I've attached a sample
of the result map below so you can see what I mean, the keys 14336,
14464, 14528 have dups), with the repeat having a value of 1 or
sometimes 2.

This is coming right out of merge-with, so its baffling why there
would be duplicate keys, unless there is some kind of bug in that
function when called in a strenuous setting. Another theory is
something going wrong during serialization in intermediate steps. I am
using cascading-clojure. Thanks for your input.

{14336 18176, 14336 2, 14368 19111, 14400 17161, 14432 17015, 14464
14604, 14464 1, 14496 20810, 14528 18759, 14528 1, 14560 28086, 13568
162, 14592 31956, 13600 437, 14624 38402, 13632 281, 14656 38429,
13664 1137, 14688 29531}

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to