Hi,
I hope someone can help as I’m not sure if I’m using Spark correctly.
Basically, in the simple example below
I create an RDD which is just a sequence of random numbers. I then have a loop
where I just invoke rdd.count()
what I can see is that the memory use always nudges upwards.
If I attach YourKit to the JVM, I can see the garbage collector in action, but
eventually the JVM runs out of memory.
Can anyone spot if I am doing something wrong? (Obviously the example is
slightly contrived, but basically I
have an RDD with a set of numbers and I’d like to submit lots of jobs that
perform some calculation, this was
the simplest case I could create that would exhibit same memory issue.)
Regards & Thanks,
Mike
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkContext, SparkConf}
import scala.util.Random
object SparkTest {
def main(args: Array[String]) {
println ("spark memory test")
val jars = Seq("spark-test-1.0-SNAPSHOT.jar")
val sparkConfig : SparkConf = new SparkConf()
.setMaster("local")
.setAppName("tester")
.setJars(jars)
val sparkContext = new SparkContext(sparkConfig)
val list = Seq.fill(1200000)(Random.nextInt)
val rdd : RDD[Int] = sparkContext.makeRDD(list,10)
for (i <- 1 to 1000000) {
rdd.count()
}
sparkContext.stop()
}
}