You cannot reference an RDD within a closure passed to another RDD. Your code should instead look like this:
val rdd1 = sc.parallelize(1 to 10) val rdd2 = sc.parallelize(11 to 20) val rdd2Count = rdd2.count rdd1.map{ i => rdd2Count } .foreach(println(_)) You should also note that even if your original code did work, you would be re-counting rdd2 for every single record in rdd1. Unless your RDD is cached / persisted, the count will be recomputed every time it is called. So that would be extremely inefficient. On Thu, Nov 13, 2014 at 2:28 PM, Simone Franzini <captainfr...@gmail.com> wrote: > The following code fails with NullPointerException in RDD class on the > count function: > > val rdd1 = sc.parallelize(1 to 10) > val rdd2 = sc.parallelize(11 to 20) > rdd1.map{ i => > rdd2.count > } > .foreach(println(_)) > > The same goes for any other action I am trying to perform inside the map > statement. I am failing to understand what I am doing wrong. > Can anyone help with this? > > Thanks, > Simone Franzini, PhD > > http://www.linkedin.com/in/simonefranzini > -- Daniel Siegmann, Software Developer Velos Accelerating Machine Learning 54 W 40th St, New York, NY 10018 E: daniel.siegm...@velos.io W: www.velos.io