You cannot reference an RDD within a closure passed to another RDD. Your
code should instead look like this:

    val rdd1 = sc.parallelize(1 to 10)
    val rdd2 = sc.parallelize(11 to 20)
    val rdd2Count = rdd2.count
    rdd1.map{ i =>
         rdd2Count
    }
    .foreach(println(_))

You should also note that even if your original code did work, you would be
re-counting rdd2 for every single record in rdd1. Unless your RDD is cached
/ persisted, the count will be recomputed every time it is called. So that
would be extremely inefficient.


On Thu, Nov 13, 2014 at 2:28 PM, Simone Franzini <captainfr...@gmail.com>
wrote:

> The following code fails with NullPointerException in RDD class on the
> count function:
>
>     val rdd1 = sc.parallelize(1 to 10)
>     val rdd2 = sc.parallelize(11 to 20)
>     rdd1.map{ i =>
>          rdd2.count
>     }
>     .foreach(println(_))
>
> The same goes for any other action I am trying to perform inside the map
> statement. I am failing to understand what I am doing wrong.
> Can anyone help with this?
>
> Thanks,
> Simone Franzini, PhD
>
> http://www.linkedin.com/in/simonefranzini
>



-- 
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

54 W 40th St, New York, NY 10018
E: daniel.siegm...@velos.io W: www.velos.io

Reply via email to