Hi again,

Below is the log from executor

FetchFailed(BlockManagerId(4, compute-10-0.local, 38594), shuffleId=0, mapId=117, reduceId=117, message= org.apache.spark.shuffle.FetchFailedException: Failed to connect to compute-10-0.local/10.10.255.241:38594 at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:125) at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:160) at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:159)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: Failed to connect to compute-10-0.local/10.10.255.241:38594 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156) at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43) at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    ... 3 more
Caused by: java.net.ConnectException: Connection refused: compute-10-0.local/10.10.255.241:38594
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
    ... 1 more

)

I am using spark 1.3.1, is the problem from the https://issues.apache.org/jira/browse/SPARK-4516?

Best,
Patcharee

On 03. juni 2015 10:11, Akhil Das wrote:
Which version of spark? Looks like you are hitting this one https://issues.apache.org/jira/browse/SPARK-4516

Thanks
Best Regards

On Wed, Jun 3, 2015 at 1:06 PM, patcharee <patcharee.thong...@uni.no <mailto:patcharee.thong...@uni.no>> wrote:

    This is log I can get>

    15/06/02 16:37:31 INFO shuffle.RetryingBlockFetcher: Retrying
    fetch (2/3) for 4 outstanding blocks after 5000 ms
    15/06/02 16:37:36 INFO client.TransportClientFactory: Found
    inactive connection to compute-10-3.local/10.10.255.238:33671
    <http://10.10.255.238:33671>, creating a new one.
    15/06/02 16:37:36 WARN server.TransportChannelHandler: Exception
    in connection from /10.10.255.238:35430 <http://10.10.255.238:35430>
    java.io.IOException: Connection reset by peer
            at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
            at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
            at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
            at sun.nio.ch.IOUtil.read(IOUtil.java:192)
            at
    sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
            at
    
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
            at
    io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
            at
    
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225)
            at
    
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
            at
    io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
            at
    
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
            at
    io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
            at
    io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
            at
    
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
            at java.lang.Thread.run(Thread.java:744)
    15/06/02 16:37:36 ERROR server.TransportRequestHandler: Error
    sending result
    ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1033433133943,
    chunkIndex=1},
    
buffer=FileSegmentManagedBuffer{file=/hdisk3/hadoop/yarn/local/usercache/patcharee/appcache/application_1432633634512_0213/blockmgr-12d59e6b-0895-4a0e-9d06-152d2f7ee855/09/shuffle_0_56_0.data,
    offset=896, length=1132499356}} to /10.10.255.238:35430
    <http://10.10.255.238:35430>; closing connection
    java.nio.channels.ClosedChannelException
    15/06/02 16:37:38 ERROR shuffle.RetryingBlockFetcher: Exception
    while beginning fetch of 4 outstanding blocks (after 2 retries)
    java.io.IOException: Failed to connect to
    compute-10-3.local/10.10.255.238:33671 <http://10.10.255.238:33671>
            at
    
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
            at
    
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
            at
    
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
            at
    
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
            at
    
org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
            at
    
org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
            at
    java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            at java.util.concurrent.FutureTask.run(FutureTask.java:262)
            at
    
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at
    
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:744)
    Caused by: java.net.ConnectException: Connection refused:
    compute-10-3.local/10.10.255.238:33671 <http://10.10.255.238:33671>
            at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
            at
    sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
            at
    
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
            at
    
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
            at
    io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
            at
    
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
            at
    io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
            at
    io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
            at
    
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
            ... 1 more



    Best,
    Patcharee


    On 03. juni 2015 09:21, Akhil Das wrote:
    You need to look into your executor/worker logs to see whats
    going on.

    Thanks
    Best Regards

    On Wed, Jun 3, 2015 at 12:01 PM, patcharee
    <patcharee.thong...@uni.no <mailto:patcharee.thong...@uni.no>> wrote:

        Hi,

        What can be the cause of this ERROR cluster.YarnScheduler:
        Lost executor? How can I fix it?

        Best,
        Patcharee

        ---------------------------------------------------------------------
        To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
        <mailto:user-unsubscr...@spark.apache.org>
        For additional commands, e-mail: user-h...@spark.apache.org
        <mailto:user-h...@spark.apache.org>





Reply via email to