Hi All.

I need to create a lot of RDDs starting from a set of "roots" and count the
rows in each. Something like this:

final JavaSparkContext sc = new JavaSparkContext(conf);
List<String> roots = ...
Map<String, Object> res = sc.parallelize(roots).mapToPair(new
PairFunction<String, String, Long>(){
    public Tuple2<String, Long> call(String root) throws Exception {
        ... create RDD based on root from sc somehow ...
        return new Tuple2<String, Long>(root, rdd.count())
    }
}).countByKey()

This fails with a message about JavaSparkContext not being serializable.

Is there a way to get at the content inside of the map function or should I
be doing something else entirely?

Thanks
David

Reply via email to