Actually there was still Fetch failure. However, after I upgrade the spark to 1.1.1, this error was not met again.
Thanks, Mars 发件人: Akhil Das [mailto:ak...@sigmoidanalytics.com] 发送时间: 2014年12月16日 17:52 收件人: Ma,Xi 抄送: u...@spark.incubator.apache.org 主题: Re: 答复: Fetch Failed caused job failed. So the fetch failure error is gone? Can you paste the code that you are executing? What is the size of the data and your cluster setup? Thanks Best Regards On Tue, Dec 16, 2014 at 3:16 PM, Ma,Xi <m...@baidu.com<mailto:m...@baidu.com>> wrote: Hi Das, Thanks for your advice. I'm not sure what's the usage of setting memoryFraction to 1. I've tried to rerun the test again with the following parameters in spark_default.conf, but failed again: spark.rdd.compress true spark.akka.frameSize 50 spark.storage.memoryFraction 0.8 spark.core.connection.ack.wait.timeout 6000 14/12/16 16:45:08 ERROR PythonRDD: Python worker exited unexpectedly (crashed) org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main command = pickleSer._read_with_length(infile) File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in _read_with_length length = read_int(stream) File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in read_int raise EOFError EOFError at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124) at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) I suspect that there something wrong in shuffle stage, but not sure what's the error ? Thanks, Mars 发件人: Akhil Das [mailto:ak...@sigmoidanalytics.com<mailto:ak...@sigmoidanalytics.com>] 发送时间: 2014年12月16日 14:57 收件人: Ma,Xi 抄送: u...@spark.incubator.apache.org<mailto:u...@spark.incubator.apache.org> 主题: Re: Fetch Failed caused job failed. You could try setting the following while creating the sparkContext .set("spark.rdd.compress","true") .set("spark.storage.memoryFraction","1") .set("spark.core.connection.ack.wait.timeout","600") .set("spark.akka.frameSize","50") Thanks Best Regards On Tue, Dec 16, 2014 at 8:30 AM, Mars Max <m...@baidu.com<mailto:m...@baidu.com>> wrote: While I was running spark MR job, there was FetchFailed(BlockManagerId(47, xxxxxxxxxx.com<http://xxxxxxxxxx.com>, 40975, 0), shuffleId=2, mapId=5, reduceId=286), then there were many retries, and the job failed finally. And the log showed the following error, does anybody meet this error ? or is it a known issue in Spark ? Thanks. 4/12/16 10:43:43 ERROR PythonRDD: Python worker exited unexpectedly (crashed) org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main command = pickleSer._read_with_length(infile) File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in _read_with_length length = read_int(stream) File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in read_int raise EOFError EOFError at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124) at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:265) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.spark.shuffle.FetchFailedException: Fetch failed: BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com<http://nmg01-taihang-d11609.nmg01.baidu.com>, 40975, 0) 2 5 286 at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335) at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209) at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184) at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183) 14/12/16 10:43:43 ERROR PythonRDD: This may have been caused by a prior exception: org.apache.spark.shuffle.FetchFailedException: Fetch failed: BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com<http://nmg01-taihang-d11609.nmg01.baidu.com>, 40975, 0) 2 5 286 at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335) at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209) at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184) at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183) 14/12/16 10:43:43 INFO CoarseGrainedExecutorBackend: Got assigned task 18305 14/12/16 10:43:43 INFO Executor: Running task 623.0 in stage 5.0 (TID 18305) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Fetch-Failed-caused-job-failed-tp20697.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>