Hi All. I need to create a lot of RDDs starting from a set of "roots" and count the rows in each. Something like this:
final JavaSparkContext sc = new JavaSparkContext(conf); List<String> roots = ... Map<String, Object> res = sc.parallelize(roots).mapToPair(new PairFunction<String, String, Long>(){ public Tuple2<String, Long> call(String root) throws Exception { ... create RDD based on root from sc somehow ... return new Tuple2<String, Long>(root, rdd.count()) } }).countByKey() This fails with a message about JavaSparkContext not being serializable. Is there a way to get at the content inside of the map function or should I be doing something else entirely? Thanks David