say source is HDFS,And file is divided in 10 partitions. so what will be input contains.
public Iterable<Integer> call(Iterator<String> input) say I have 10 executors in job each having single partition. will it have some part of partition or complete. And if some when I call input.next() - it will fetch rest or how is it handled ? On Thu, Jun 25, 2015 at 3:11 PM, Sean Owen <so...@cloudera.com> wrote: > No, or at least, it depends on how the source of the partitions was > implemented. > > On Thu, Jun 25, 2015 at 12:16 PM, Shushant Arora > <shushantaror...@gmail.com> wrote: > > Does mapPartitions keep complete partitions in memory of executor as > > iterable. > > > > JavaRDD<String> rdd = jsc.textFile("path"); > > JavaRDD<Integer> output = rdd.mapPartitions(new > > FlatMapFunction<Iterator<String>, Integer>() { > > > > public Iterable<Integer> call(Iterator<String> input) > > throws Exception { > > List<Integer> output = new ArrayList<Integer>(); > > while(input.hasNext()){ > > output.add(input.next().length()); > > } > > return output; > > } > > > > }); > > > > > > Here does input is present in memory and can contain complete partition > of > > gbs ? > > Will this function call(Iterator<String> input) is called only for no of > > partitions(say if I have 10 in this example) times. Not no of lines > > times(say 10000000) . > > > > > > And whats the use of mapPartitionsWithIndex ? > > > > Thanks > > >