Actually there was still Fetch failure. However, after I upgrade the spark to 
1.1.1, this error was not met again.

Thanks,
Mars


发件人: Akhil Das [mailto:ak...@sigmoidanalytics.com]
发送时间: 2014年12月16日 17:52
收件人: Ma,Xi
抄送: u...@spark.incubator.apache.org
主题: Re: 答复: Fetch Failed caused job failed.

So the fetch failure error is gone? Can you paste the code that you are 
executing? What is the size of the data and your cluster setup?

Thanks
Best Regards

On Tue, Dec 16, 2014 at 3:16 PM, Ma,Xi <m...@baidu.com<mailto:m...@baidu.com>> 
wrote:
Hi Das,

Thanks for your advice.

I'm not sure what's the usage of setting memoryFraction to 1. I've tried to 
rerun the test again with the following parameters in spark_default.conf, but 
failed again:

spark.rdd.compress  true
spark.akka.frameSize  50
spark.storage.memoryFraction 0.8
spark.core.connection.ack.wait.timeout 6000

14/12/16 16:45:08 ERROR PythonRDD: Python worker exited unexpectedly (crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
    command = pickleSer._read_with_length(infile)
  File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in 
_read_with_length
    length = read_int(stream)
  File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in 
read_int
    raise EOFError
EOFError
         at 
org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
         at 
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
         at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
         at org.apache.spark.scheduler.Task.run(Task.scala:54)
         at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)

I suspect that there something wrong in shuffle stage, but not sure what's the 
error ?

Thanks,

Mars


发件人: Akhil Das 
[mailto:ak...@sigmoidanalytics.com<mailto:ak...@sigmoidanalytics.com>]
发送时间: 2014年12月16日 14:57
收件人: Ma,Xi
抄送: u...@spark.incubator.apache.org<mailto:u...@spark.incubator.apache.org>
主题: Re: Fetch Failed caused job failed.

You could try setting the following while creating the sparkContext


      .set("spark.rdd.compress","true")

      .set("spark.storage.memoryFraction","1")

      .set("spark.core.connection.ack.wait.timeout","600")

      .set("spark.akka.frameSize","50")


Thanks
Best Regards

On Tue, Dec 16, 2014 at 8:30 AM, Mars Max 
<m...@baidu.com<mailto:m...@baidu.com>> wrote:
While I was running spark MR job, there was FetchFailed(BlockManagerId(47,
xxxxxxxxxx.com<http://xxxxxxxxxx.com>, 40975, 0), shuffleId=2, mapId=5, 
reduceId=286), then there
were many retries, and the job failed finally.

And the log showed the following error, does anybody meet this error ? or is
it a known issue in Spark ? Thanks.

4/12/16 10:43:43 ERROR PythonRDD: Python worker exited unexpectedly
(crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
    command = pickleSer._read_with_length(infile)
  File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
_read_with_length
    length = read_int(stream)
  File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
read_int
    raise EOFError
EOFError

        at 
org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
        at
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:265)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:54)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.spark.shuffle.FetchFailedException: Fetch failed:
BlockManagerId(47, 
nmg01-taihang-d11609.nmg01.baidu.com<http://nmg01-taihang-d11609.nmg01.baidu.com>,
 40975, 0) 2 5 286
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
        at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
        at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12/16 10:43:43 ERROR PythonRDD: This may have been caused by a prior
exception:
org.apache.spark.shuffle.FetchFailedException: Fetch failed:
BlockManagerId(47, 
nmg01-taihang-d11609.nmg01.baidu.com<http://nmg01-taihang-d11609.nmg01.baidu.com>,
 40975, 0) 2 5 286
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
        at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
        at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12/16 10:43:43 INFO CoarseGrainedExecutorBackend: Got assigned task 18305
14/12/16 10:43:43 INFO Executor: Running task 623.0 in stage 5.0 (TID 18305)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Fetch-Failed-caused-job-failed-tp20697.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

Reply via email to