external shuffle service is not enabled

________________________________
best regards,
-zhenhua

From: Hamel Kothari<mailto:hamelkoth...@gmail.com>
Date: 2016-01-27 22:21
To: Ted Yu<mailto:yuzhih...@gmail.com>; wangzhenhua 
(G)<mailto:wangzhen...@huawei.com>
CC: dev<mailto:dev@spark.apache.org>
Subject: Re: timeout in shuffle problem
Are you running on YARN? Another possibility here is that your shuffle managers 
are facing GC pain and becoming less responsive, thus missing timeouts. Can you 
try increasing the memory on the node managers and see if that helps?

On Sun, Jan 24, 2016 at 4:58 PM Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:
Cycling past bits:
http://search-hadoop.com/m/q3RTtU5CRU1KKVA42&subj=RE+shuffle+FetchFailedException+in+spark+on+YARN+job

On Sun, Jan 24, 2016 at 5:52 AM, wangzhenhua (G) 
<wangzhen...@huawei.com<mailto:wangzhen...@huawei.com>> wrote:
Hi,

I have a problem of time out in shuffle, it happened after shuffle write and at 
the start of shuffle read,
logs on driver and executors are shown as below. Spark version is 1.5. Looking 
forward to your replys. Thanks!

logs on driver only have warnings:
WARN TaskSetManager: Lost task 38.0 in stage 27.0 (TID 127459, linux-162): 
FetchFailed(BlockManagerId(66, 172.168.100.12, 23028), shuffleId=9, mapId=55, 
reduceId=38, message=
org.apache.spark.shuffle.FetchFailedException: 
java.util.concurrent.TimeoutException: Timeout waiting for task.
        at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:321)
        at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:306)
        at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:173)
        at 
org.apache.spark.sql.execution.TungstenSort.org<http://org.apache.spark.sql.execution.TungstenSort.org>$apache$spark$sql$execution$TungstenSort$$executePartition$1(sort.scala:160)
        at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$4.apply(sort.scala:169)
        at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$4.apply(sort.scala:169)
        at 
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:99)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:63)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: 
Timeout waiting for task.
        at 
org.spark-project.guava.base.Throwables.propagate(Throwables.java:160)
        at 
org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:196)
        at 
org.apache.spark.network.sasl.SaslClientBootstrap.doBootstrap(SaslClientBootstrap.java:76)
        at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:205)
        at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
        at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:88)
        at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
        at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
        at 
org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:97)
        at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:152)
        at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:301)
        ... 57 more
Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task.
        at 
org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:276)
        at 
org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)
        at 
org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:192)
        ... 66 more

logs on executor have errors like:
ERROR | [shuffle-client-1] | Still have 1 requests outstanding when connection 
from /172.168.100.12:23908<http://172.168.100.12:23908> is closed | 
org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:102)

________________________________
best regards,
-zhenhua

Reply via email to