say source is HDFS,And file is divided in 10 partitions. so what will be
 input contains.

public Iterable<Integer> call(Iterator<String> input)

say I have 10 executors in job each having single partition.

will it have some part of partition or complete. And if some when I call
input.next() - it will fetch rest or how is it handled ?





On Thu, Jun 25, 2015 at 3:11 PM, Sean Owen <so...@cloudera.com> wrote:

> No, or at least, it depends on how the source of the partitions was
> implemented.
>
> On Thu, Jun 25, 2015 at 12:16 PM, Shushant Arora
> <shushantaror...@gmail.com> wrote:
> > Does mapPartitions keep complete partitions in memory of executor as
> > iterable.
> >
> > JavaRDD<String> rdd = jsc.textFile("path");
> > JavaRDD<Integer> output = rdd.mapPartitions(new
> > FlatMapFunction<Iterator<String>, Integer>() {
> >
> > public Iterable<Integer> call(Iterator<String> input)
> > throws Exception {
> > List<Integer> output = new ArrayList<Integer>();
> > while(input.hasNext()){
> > output.add(input.next().length());
> > }
> > return output;
> > }
> >
> > });
> >
> >
> > Here does input is present in memory and can contain complete partition
> of
> > gbs ?
> > Will this function call(Iterator<String> input) is called only for no of
> > partitions(say if I have 10 in this example) times. Not no of lines
> > times(say 10000000) .
> >
> >
> > And whats the use of mapPartitionsWithIndex ?
> >
> > Thanks
> >
>

Reply via email to