3.Also will the mappartitions can go out of memory if I return the
arraylist of whole partition after processing the partition ? whats the
alternative to this if this can fail.

On Fri, Jan 27, 2017 at 9:32 AM, Shushant Arora <shushantaror...@gmail.com>
wrote:

> Hi
>
> I have two transformations in series.
>
> rdd1 = sourcerdd.map(new Function(...)); //step1
> rdd2 = rdd1.mapPartitions(new Function(...)); //step2
>
> 1.Is map and mapPartitions narrow dependency ? Does spark optimise the dag
> and execute step 1 and step2 in single stage or there will be two stages ?
>
> Bsically I have a requirement to use a complex object in step2 which I
> don't want to instantiate for each record so I have used mapPartitons at
> step 2.
>
> 2.If I have a requirement to instantiate a complex object across all tasks
>  on same executor node also , does making object singleton is fine there ?
> Since java discourages singleton , will it be fine here to use singleton or
> is there any other better alternative ?
>
> Thanks
>

Reply via email to