So the fetch failure error is gone? Can you paste the code that you are
executing? What is the size of the data and your cluster setup?

Thanks
Best Regards

On Tue, Dec 16, 2014 at 3:16 PM, Ma,Xi <m...@baidu.com> wrote:
>
>  Hi Das,
>
>
>
> Thanks for your advice.
>
>
>
> I'm not sure what's the usage of setting memoryFraction to 1. I've tried
> to rerun the test again with the following parameters in
> spark_default.conf, but failed again:
>
>
>
> spark.rdd.compress  true
>
> spark.akka.frameSize  50
>
> spark.storage.memoryFraction 0.8
>
> spark.core.connection.ack.wait.timeout 6000
>
>
>
> 14/12/16 16:45:08 ERROR PythonRDD: Python worker exited unexpectedly
> (crashed)
>
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>
>   File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
>
>     command = pickleSer._read_with_length(infile)
>
>   File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
> _read_with_length
>
>     length = read_int(stream)
>
>   File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
> read_int
>
>     raise EOFError
>
> EOFError
>
>          at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
>
>          at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
>
>          at
> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
>
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>
>          at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>
>          at org.apache.spark.scheduler.Task.run(Task.scala:54)
>
>          at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>
>          at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>          at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>          at java.lang.Thread.run(Thread.java:662)
>
>
>
> I suspect that there something wrong in shuffle stage, but not sure what's
> the error ?
>
>
>
> Thanks,
>
>
>
> Mars
>
>
>
>
>
> *发件人:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
> *发送时间:* 2014年12月16日 14:57
> *收件人:* Ma,Xi
> *抄送:* u...@spark.incubator.apache.org
> *主题:* Re: Fetch Failed caused job failed.
>
>
>
> You could try setting the following while creating the sparkContext
>
>
>
>       *.*set*(*"spark.rdd.compress"*,*"true"*)*
>
>       *.*set*(*"spark.storage.memoryFraction"*,*"1"*)*
>
>       *.*set*(*"spark.core.connection.ack.wait.timeout"*,*"600"*)*
>
>       *.*set*(*"spark.akka.frameSize"*,*"50"*)*
>
>
>
>
>   Thanks
>
> Best Regards
>
>
>
> On Tue, Dec 16, 2014 at 8:30 AM, Mars Max <m...@baidu.com> wrote:
>
> While I was running spark MR job, there was FetchFailed(BlockManagerId(47,
> xxxxxxxxxx.com, 40975, 0), shuffleId=2, mapId=5, reduceId=286), then there
> were many retries, and the job failed finally.
>
> And the log showed the following error, does anybody meet this error ? or
> is
> it a known issue in Spark ? Thanks.
>
> 4/12/16 10:43:43 ERROR PythonRDD: Python worker exited unexpectedly
> (crashed)
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
>     command = pickleSer._read_with_length(infile)
>   File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
> _read_with_length
>     length = read_int(stream)
>   File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
> read_int
>     raise EOFError
> EOFError
>
>         at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
>         at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
>         at
> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         at
> org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:265)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.spark.shuffle.FetchFailedException: Fetch failed:
> BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com, 40975, 0) 2 5 286
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
>         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>         at
>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
>         at
>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at
>
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>         at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
>         at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
> 14/12/16 10:43:43 ERROR PythonRDD: This may have been caused by a prior
> exception:
> org.apache.spark.shuffle.FetchFailedException: Fetch failed:
> BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com, 40975, 0) 2 5 286
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
>         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>         at
>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
>         at
>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at
>
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>         at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
>         at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
> 14/12/16 10:43:43 INFO CoarseGrainedExecutorBackend: Got assigned task
> 18305
> 14/12/16 10:43:43 INFO Executor: Running task 623.0 in stage 5.0 (TID
> 18305)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Fetch-Failed-caused-job-failed-tp20697.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

Reply via email to