Hi,

While loading some 10k+ .gz files through HiveServer with LOAD FILE etc.
etc.

11/01/06 22:12:42 INFO exec.CopyTask: Copying data from file:XXX.gz to
hdfs://YYY
11/01/06 22:12:42 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.SocketException: Too many open files
11/01/06 22:12:42 INFO hdfs.DFSClient: Abandoning block
blk_8251287732961496983_1741138
11/01/06 22:12:48 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.SocketException: Too many open files
11/01/06 22:12:48 INFO hdfs.DFSClient: Abandoning block
blk_-2561354015640936272_1741138
11/01/06 22:12:54 WARN hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: Too many open files
        at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
        at sun.nio.ch.EPollArrayWrapper.<init>(EPollArrayWrapper.java:69)
        at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:52)
        at
sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
        at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407)
        at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322)
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2314)

11/01/06 22:12:54 WARN hdfs.DFSClient: Error Recovery for block
blk_2907917521214666486_1741138 bad datanode[0] 172.27.1.34:50010
11/01/06 22:12:54 WARN hdfs.DFSClient: Error Recovery for block
blk_2907917521214666486_1741138 in pipeline 172.27.1.34:50010,
172.27.1.4:50010: bad datanode 172.27.1.34:50010
Exception in thread "DataStreamer for file YYY block blk_29
07917521214666486_1741138" java.lang.NullPointerException
        at
org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:351)
        at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:313)
        at
org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
        at org.apache.hadoop.ipc.Client.call(Client.java:720)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy9.recoverBlock(Unknown Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2581)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2102)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2265)

After this, the HiveServer stops working throwing various exceptions due to
too many open files.

This is from a trunk checkout from yesterday January 6th.
Seems like we are leaking connections to datanodes on port 50010?

Regards,
Terje

Reply via email to