Re: Re: Spark program thows OutOfMemoryError

Qin Wei Thu, 17 Apr 2014 01:25:23 -0700




Hi, Andre, thanks a lot for you reply, but i still get the same exception, the 
complete exception message is as below：
Exception in thread "main" org.apache.spark.SparkException: Job aborted: Task 
1.0:9 failed 4 times (most recent failure: Exception failure: 
java.lang.OutOfMemoryError: Java heap space)        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)  
     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)   at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
       at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
       at scala.Option.foreach(Option.scala:236)       at 
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604) at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
      at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)     at 
akka.actor.ActorCell.invoke(ActorCell.scala:456)     at 
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)      at 
akka.dispatch.Mailbox.run(Mailbox.scala:219) at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
     at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)     at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
according to your hints，i add SPARK_DRIVER_MEMORY to my spark-env.sh：    export 
SPARK_MASTER_IP=192.168.2.184    export SPARK_MASTER_PORT=7077    export 
SPARK_LOCAL_IP=192.168.2.183    export SPARK_DRIVER_MEMORY=10G    export 
SPARK_JAVA_OPTS="-Xms4g -Xmx40g -XX:MaxPermSize=10g"

and i modified my code, now i do not call method collect any more, here is my 
code:  def main(args: Array[String]) {

    val sc = new SparkContext("spark://192.168.2.184:7077", "Score Calcu 
Total", "/usr/local/spark-0.9.1-bin-hadoop2", Seq("/home/deployer/myjar.jar"))



    val mongoRDD = sc.textFile("/home/deployer/uris.dat", 200)

    val jsonRDD = mongoRDD.map(arg => new JSONObject(arg))

    val newRDD = jsonRDD.map(arg => {        var score = 0.5        
arg.put("score", score)        arg        })
    val resourceScoresRDD = newRDD.map(arg => (arg.get("rid").toString.toLong, 
(arg.get("zid").toString, 
arg.get("score").asInstanceOf[Number].doubleValue))).groupByKey()
    val simRDD = resourceScoresRDD.cartesian(resourceScoresRDD).filter(arg => 
arg._1._1 > arg._2._1).map(arg => (arg._1._1, arg._2._1, 0.8))        
simRDD.saveAsTextFile("/home/deployer/sim")}
I ran the program through "java -jar myjar.jar", it crashed quickly, but it 
succeed when the size of the data file is small.
Thanks for your help!

qinwei
 From: Andre Bois-Crettez [via Apache Spark User List]Date: 2014-04-16 17:50To: 
Qin WeiSubject: Re: Spark program thows OutOfMemoryError

        Seem you have not enough memory on the spark driver. Hints below :


On 2014-04-15 12:10, Qin Wei wrote:

>      val resourcesRDD = jsonRDD.map(arg =>

> arg.get("rid").toString.toLong).distinct

>

>      // the program crashes at this line of code

>      val bcResources = sc.broadcast(resourcesRDD.collect.toList)

what is returned by resources.RDD.count() ?


> The data file “/home/deployer/uris.dat” is 2G  with lines like this :     {

> "id" : 1, "a" : { "0" : 1 }, "rid" : 5487628, "zid" : "10550869" }

>

> And here is my spark-env.sh

>      export SCALA_HOME=/usr/local/scala-2.10.3

>      export SPARK_MASTER_IP=192.168.2.184

>      export SPARK_MASTER_PORT=7077

>      export SPARK_LOCAL_IP=192.168.2.182

>      export SPARK_WORKER_MEMORY=20g

>      export SPARK_MEM=10g

>      export SPARK_JAVA_OPTS="-Xms4g -Xmx40g -XX:MaxPermSize=10g

> -XX:-UseGCOverheadLimit"
/try setting SPARK_DRIVER_MEMORY to a bigger value, as default 512m is

probably too small for the resourcesRDD.collect()/

By the way, are you really sure you need to collect all that ?


/André Bois-Crettez


Software Architect

Big Data Developer

http://www.kelkoo.com/
/


Kelkoo SAS

Société par Actions Simplifiée

Au capital de € 4.168.964,30

Siège social : 8, rue du Sentier 75002 Paris

425 093 069 RCS Paris


Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.



        
        
        
        

        

        
        
                If you reply to this email, your message will be added to the 
discussion below:
                
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-program-thows-OutOfMemoryError-tp4268p4333.html
        
        
                
                To unsubscribe from Spark program thows OutOfMemoryError, click 
here.

                NAML
        




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-program-thows-OutOfMemoryError-tp4268p4372.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Re: Spark program thows OutOfMemoryError

Reply via email to