Hi, I'm using spark 2.1.0 in aws emr. Kryo Serializer. I'm broadcasting a java class :
public class NameMatcher { private static final Logger LOG = LoggerFactory.getLogger(NameMatcher.class); private final Splitter splitter; private final SetMultimap<String, IdNamed> itemsByWord; private final Multiset<IdNamed> wordCount; private NameMatcher(Builder builder) { splitter = builder.splitter; itemsByWord = cloneMultiMap(builder.itemsByWord); wordCount = cloneMultiSet(builder.wordCount); LOG.info("Matcher itemsByWorld size: {}", itemsByWord.size()); LOG.info("Matcher wordCount size: {}", wordCount.size()); } private <T> Multiset<T> cloneMultiSet(Multiset<T> multiset) { Multiset<T> result = HashMultiset.create(); result.addAll(multiset); return result; } private <T, U> SetMultimap<T, U> cloneMultiMap(Multimap<T, U> multimap) { SetMultimap<T, U> result = HashMultimap.create(); result.putAll(multimap); return result; } public Set<IdNamed> match(CharSequence text) { LOG.info("itemsByWorld Keys {}", itemsByWord.keys()); LOG.info("QueryMatching: {}", text); Multiset<IdNamed> counter = HashMultiset.create(); Set<IdNamed> result = Sets.newHashSet(); for (String word : Sets.newHashSet(splitter.split(text))) { if (itemsByWord.containsKey(word)) { for (IdNamed item : itemsByWord.get(word)) { counter.add(item); if (wordCount.count(item) == counter.count(item)) { result.add(item); } } } } return result; } } So the logs in the constructor are ok: LOG.info("Matcher itemsByWorld size: {}", itemsByWord.size()); prints itemsByWorld sizes and it's as expected. But when calling: nameMatcher.getValue().match(... in a RDD transformation, the log line in match method: LOG.info("itemsByWorld Keys {}", itemsByWord.keys()); Prints an empty list. This works alright running locally in my computer, but fail with no match running in aws emr. I usually broadcast objects and map with no problems. Can anyone give me a clue about what's happening here? Thanks you very much, Pedro.