I number of the problems I want to work with generate datasets which are
too large to hold in memory. This becomes an issue when building a
FlatMapFunction and also when the data used in combineByKey cannot be held
in memory.
The following is a simple, if a little silly, example of a
FlatMapFunction returning maxMultiples multiples of a long. It works well
for maxMultiples = 1000 but what happens if maxMultiples = 10 Billion.
The issue is that call cannot return a List or any other structure which
is held in memory. What can it return or is there another way to do this??
public static class GenerateMultiplesimplements FlatMapFunction<String,
String> {
private final long maxMultiples;
public GenerateMultiplesimplements (final long maxMultiples ) {
this,maxMultiples = maxMultiples ;
}
public Iterable<Long> call(Long l) {
List<Long> holder = new ArrayList<Long>();
for (long factor = 1; factor < maxMultiples; factor++) {
holder.add(new Long(l * factor);
}
return holder;
}
}