I would second the suggest that one of the spark committers weigh in.

Many times the repartition() command fails, no matter how many times I run
it.  

This is more of an 0.x behavior than a 1.0.2 behavior.

anyone?
Dale.




On 10/8/14, 1:06 AM, "Paul Wais" <pw...@yelp.com> wrote:

>Looks like an OOM issue?  Have you tried persisting your RDDs to allow
>disk writes?
>
>I've seen a lot of similar crashes in a Spark app that reads from HDFS
>and does joins.  I.e. I've seen "java.io.IOException: Filesystem
>closed," "Executor lost," "FetchFailed," etc etc with
>non-deterministic crashes.  I've tried persisting RDDs, tuning other
>params, and verifying that the Executor JVMs don't come close to their
>max allocated memory during operation.
>
>Looking through user@ tonight, there are a ton of email threads with
>similar crashes and no answers.  It looks like a lot of people are
>struggling with OOMs.
>
>Could one of the Spark committers please comment on this thread, or
>one of the other unanswered threads with similar crashes?  Is this
>simply how Spark behaves if Executors OOM?  What can the user do other
>than increase memory or reduce RDD size?  (And how can one deduce how
>much of either is needed?)
>
>One general workaround for OOMs could be to programmatically break the
>job input (i.e. from HDFS, input from #parallelize() ) into chunks,
>and only create/process RDDs related to one chunk at a time.  However,
>this approach has the limitations of Spark Streaming and no formal
>library support.  What might be nice is that if tasks fail, Spark
>could try to re-partition in order to avoid OOMs.
>
>
>
>On Fri, Oct 3, 2014 at 2:55 AM, jamborta <jambo...@gmail.com> wrote:
>> I have two nodes with 96G ram 16 cores, my setup is as follows:
>>
>>     conf = (SparkConf()
>>             .setMaster("yarn-cluster")
>>             .set("spark.executor.memory", "30G")
>>             .set("spark.cores.max", 32)
>>             .set("spark.executor.instances", 2)
>>             .set("spark.executor.cores", 8)
>>             .set("spark.akka.timeout", 10000)
>>             .set("spark.akka.askTimeout", 100)
>>             .set("spark.akka.frameSize", 500)
>>             .set("spark.cleaner.ttl", 86400)
>>             .set("spark.tast.maxFailures", 16)
>>             .set("spark.worker.timeout", 150)
>>
>> thanks a lot,
>>
>>
>>
>>
>> --
>> View this message in context:
>>http://apache-spark-user-list.1001560.n3.nabble.com/Any-issues-with-repar
>>tition-tp13462p15674.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>For additional commands, e-mail: user-h...@spark.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to