Hi, I hope someone can help as I’m not sure if I’m using Spark correctly. Basically, in the simple example below I create an RDD which is just a sequence of random numbers. I then have a loop where I just invoke rdd.count() what I can see is that the memory use always nudges upwards.
If I attach YourKit to the JVM, I can see the garbage collector in action, but eventually the JVM runs out of memory. Can anyone spot if I am doing something wrong? (Obviously the example is slightly contrived, but basically I have an RDD with a set of numbers and I’d like to submit lots of jobs that perform some calculation, this was the simplest case I could create that would exhibit same memory issue.) Regards & Thanks, Mike import org.apache.spark.rdd.RDD import org.apache.spark.{SparkContext, SparkConf} import scala.util.Random object SparkTest { def main(args: Array[String]) { println ("spark memory test") val jars = Seq("spark-test-1.0-SNAPSHOT.jar") val sparkConfig : SparkConf = new SparkConf() .setMaster("local") .setAppName("tester") .setJars(jars) val sparkContext = new SparkContext(sparkConfig) val list = Seq.fill(1200000)(Random.nextInt) val rdd : RDD[Int] = sparkContext.makeRDD(list,10) for (i <- 1 to 1000000) { rdd.count() } sparkContext.stop() } }