I find a number of cases where I have an JavaRDD and I wish to transform
the data and depending on a test return 0 or one item (don't suggest a
filter - the real case is more complex). So I currently do something like
the following - perform a flatmap returning a list with 0 or 1 entry
depending on the isUsed function.
JavaRDD<Foo> original = ...
JavaRDD<Foo> words = original.flatMap(new FlatMapFunction<Foo, Foo>() {
@Override
public Iterable<Foo> call(final Foo s) throws Exception {
List<Foo> ret = new ArrayList<Foo>();
if(isUsed(s))
ret.add(transform(s));
return ret; // contains 0 items if isUsed is false
}
});
My question is can I do a map returning the transformed data and null if
nothing is to be returned. as shown below - what does a Spark do with a map
function returning null
JavaRDD<Foo> words = original.map(new MapFunction<String, String>() {
@Override
Foo call(final Foo s) throws Exception {
List<Foo> ret = new ArrayList<Foo>();
if(isUsed(s))
return transform(s);
return null; // not used - what happens now
}
});