Issue was solved by clearing hashmap and hashset at the beginning of the call
method.
From: Jacob Maloney [mailto:jmalo...@conversantmedia.com]
Sent: Thursday, October 16, 2014 5:09 PM
To: user@spark.apache.org
Subject: Strange duplicates in data when scaling up
I have a flatmap function that
I have a flatmap function that shouldn't possibly emit duplicates and yet it
does. The output of my function is a HashSet so the function itself cannot
output duplicates and yet I see many copies of keys emmited from it (in one
case up to 62). The curious thing is I can't get this to happen unti
an't I access
this map? And what do I have to do to make it accessible.
Thanks,
Jacob
-Original Message-
From: user-h...@spark.apache.org [mailto:user-h...@spark.apache.org]
Sent: Friday, October 10, 2014 4:02 PM
To: Jacob Maloney
Subject: FAQ for user@spark.apache.org
Hi! Th