Hi,

I hope someone can help as  I’m not sure if I’m using Spark correctly. 
Basically, in the simple example below 
I create an RDD which is just a sequence of random numbers. I then have a loop 
where I just invoke rdd.count()
what I can see  is that the memory use always nudges upwards.

If I attach YourKit to the JVM, I can see the garbage collector in action, but 
eventually the JVM runs out of memory.

Can anyone spot if I am doing something wrong? (Obviously the example is 
slightly contrived, but basically I 
have an RDD with a set of numbers and I’d like to submit lots of jobs that 
perform some calculation, this was
the simplest case I could create that would exhibit same memory issue.)

Regards & Thanks,
Mike


import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkContext, SparkConf}
import scala.util.Random

object SparkTest {
  def main(args: Array[String]) {
    println ("spark memory test")

    val jars = Seq("spark-test-1.0-SNAPSHOT.jar")

    val sparkConfig : SparkConf = new SparkConf()
                                  .setMaster("local")
                                  .setAppName("tester")
                                  .setJars(jars)

    val sparkContext = new SparkContext(sparkConfig)
    val list = Seq.fill(1200000)(Random.nextInt)
    val rdd : RDD[Int] = sparkContext.makeRDD(list,10)

    for (i <- 1 to 1000000) {
      rdd.count()
    }
    sparkContext.stop()
  }
}

Reply via email to