Zsolt - what version of Java are you running?

On Mon, Mar 30, 2015 at 7:12 AM, Zsolt Tóth <toth.zsolt....@gmail.com>
wrote:

> Thanks for your answer!
> I don't call .collect because I want to trigger the execution. I call it
> because I need the rdd on the driver. This is not a huge RDD and it's not
> larger than the one returned with 50GB input data.
>
> The end of the stack trace:
>
> The two IP's are the two worker nodes, I think they can't connect to the
> driver after they finished their part of the collect().
>
> 15/03/30 10:38:25 INFO executor.Executor: Finished task 872.0 in stage 1.0 
> (TID 1745). 1414 bytes result sent to driver
> 15/03/30 10:38:25 INFO storage.MemoryStore: ensureFreeSpace(200) called with 
> curMem=405753, maxMem=4883742720
> 15/03/30 10:38:25 INFO storage.MemoryStore: Block rdd_4_867 stored as values 
> in memory (estimated size 200.0 B, free 4.5 GB)
> 15/03/30 10:38:25 INFO storage.MemoryStore: ensureFreeSpace(80) called with 
> curMem=405953, maxMem=4883742720
> 15/03/30 10:38:25 INFO storage.MemoryStore: Block rdd_4_868 stored as values 
> in memory (estimated size 80.0 B, free 4.5 GB)
> 15/03/30 10:38:25 INFO storage.BlockManagerMaster: Updated info of block 
> rdd_4_867
> 15/03/30 10:38:25 INFO executor.Executor: Finished task 867.0 in stage 1.0 
> (TID 1740). 1440 bytes result sent to driver
> 15/03/30 10:38:25 INFO storage.BlockManagerMaster: Updated info of block 
> rdd_4_868
> 15/03/30 10:38:25 INFO executor.Executor: Finished task 868.0 in stage 1.0 
> (TID 1741). 1422 bytes result sent to driver
> 15/03/30 10:53:45 WARN server.TransportChannelHandler: Exception in 
> connection from /10.102.129.251:42026
> java.io.IOException: Connection timed out
>       at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>       at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>       at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>       at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>       at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
>       at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
>       at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225)
>       at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>       at java.lang.Thread.run(Thread.java:745)
> 15/03/30 10:53:45 WARN server.TransportChannelHandler: Exception in 
> connection from /10.102.129.251:41703
> java.io.IOException: Connection timed out
>       at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>       at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>       at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>       at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>       at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
>       at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
>       at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225)
>       at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>       at java.lang.Thread.run(Thread.java:745)
> 15/03/30 10:53:46 WARN server.TransportChannelHandler: Exception in 
> connection from /10.99.144.92:49021
> java.io.IOException: Connection timed out
>       at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>       at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>       at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>       at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>       at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
>       at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
>       at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225)
>       at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>       at java.lang.Thread.run(Thread.java:745)
> 15/03/30 10:53:51 WARN server.TransportChannelHandler: Exception in 
> connection from /10.99.144.92:49033
> java.io.IOException: Connection timed out
>       at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>       at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>       at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>       at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>       at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
>       at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
>       at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225)
>       at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>       at java.lang.Thread.run(Thread.java:745)
>
>
> 2015-03-29 17:01 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>
>> Don't call .collect if your data size huge, you can simply do a count()
>> to trigger the execution.
>>
>> Can you paste your exception stack trace so that we'll know whats
>> happening?
>>
>> Thanks
>> Best Regards
>>
>> On Fri, Mar 27, 2015 at 9:18 PM, Zsolt Tóth <toth.zsolt....@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have a simple Spark application: it creates an input rdd with
>>> sc.textfile, and it calls flatMapToPair, reduceByKey and map on it. The
>>> output rdd is small, a few MB's. Then I call collect() on the output.
>>>
>>> If the textfile is ~50GB, it finishes in a few minutes. However, if it's
>>> larger (~100GB) the execution hangs at the end of the collect() stage. The
>>> UI shows one active job (collect); one completed (flatMapToPair) and one
>>> active stage (collect). The collect stage has 880/892 tasks succeeded so I
>>> think the issue should happen when the whole job is finished (every task on
>>> the UI is either in SUCCESS or in RUNNING state).
>>> The driver and the containers don't log anything for 15 mins, then I get
>>> Connection time out.
>>>
>>> I run the job in yarn-cluster mode on Amazon EMR with Spark 1.2.1 and
>>> Hadoop 2.4.0.
>>>
>>> This happens every time I run the process with larger input data so I
>>> think this isn't just a connection issue or something like that. Is this a
>>> Spark bug or something is wrong with my setup?
>>>
>>> Zsolt
>>>
>>
>>
>

Reply via email to