From what I understood it will then be possible to tell hive it's loading a csv while you are in fact loading something else (sequence files forinstance). I don't think that's a big deal.
Op 7 jan. 2011 om 11:41 heeft "Terje Marthinussen" <tmarthinus...@gmail.com<mailto:tmarthinus...@gmail.com>> het volgende geschreven: Seems like this works for me too! That probably saved me for a bunch of hours tracing this down through hive and hadoop Do you know what the side effect of setting this to false would be?. Thanks! Terje On Fri, Jan 7, 2011 at 4:39 PM, Bennie Schut <<mailto:bsc...@ebuddy.com>bsc...@ebuddy.com<mailto:bsc...@ebuddy.com>> wrote: In the past I ran into a similar problem which was actually caused by a bug in hadoop. Someone was nice enough to come up with a workaround for this. Perhaps you are running into a similar problem. I also had this problem when calling lots of “load file” commands. After adding this to the hive-site.xml we never had this problem again: <!-- workaround for connection leak problem fixed in HADOOP-5476 but only commited to hadoop 0.21.0 --> <property> <name>hive.fileformat.check</name> <value>false</value> </property> ________________________________ From: Terje Marthinussen [mailto:<mailto:tmarthinus...@gmail.com>tmarthinus...@gmail.com<mailto:tmarthinus...@gmail.com>] Sent: Friday, January 07, 2011 4:14 AM To: <mailto:user@hive.apache.org> user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Too many open files No, the problem is connections to datanodes on port 50010. Terje On Fri, Jan 7, 2011 at 11:46 AM, Shrijeet Paliwal <<mailto:shrij...@rocketfuel.com>shrij...@rocketfuel.com<mailto:shrij...@rocketfuel.com>> wrote: You mentioned that you got the code from trunk so fair to assume you are not hitting <https://issues.apache.org/jira/browse/HIVE-1508> https://issues.apache.org/jira/browse/HIVE-1508 Worth checking still. Are all the open files - hive history files (they look like hive_job_log*.txt) ? Like Viral suggested you can check that by monitoring open files. -Shrijeet On Thu, Jan 6, 2011 at 6:15 PM, Viral Bajaria <<mailto:viral.baja...@gmail.com>viral.baja...@gmail.com<mailto:viral.baja...@gmail.com>> wrote: > Hi Terje, > > I have asked about this issue in an earlier thread but never got any > response. > > I get this exception when I am using Hive over Thrift and submitting 1000s > of LOAD FILE commands. If you actively monitor the open file count of the > user under which I run the hive instance, it keeps on creeping yup for every > LOAD FILE command sent to it. > > I have a temporary fix by increasing the # of open file(s) to 60000+ and > then periodically restarting my thrift server (once every 2 days) to release > the open file handlers. > > I would appreciate some feedback. (trying to find my earlier email) > > Thanks, > Viral > > On Thu, Jan 6, 2011 at 4:57 PM, Terje Marthinussen > <<mailto:tmarthinus...@gmail.com>tmarthinus...@gmail.com<mailto:tmarthinus...@gmail.com>> > wrote: >> >> Hi, >> While loading some 10k+ .gz files through HiveServer with LOAD FILE etc. >> etc. >> 11/01/06 22:12:42 INFO exec.CopyTask: Copying data from file:XXX.gz to >> hdfs://YYY >> 11/01/06 22:12:42 INFO hdfs.DFSClient: Exception in >> createBlockOutputStream java.net.SocketException: Too many open files >> 11/01/06 22:12:42 INFO hdfs.DFSClient: Abandoning block >> blk_8251287732961496983_1741138 >> 11/01/06 22:12:48 INFO hdfs.DFSClient: Exception in >> createBlockOutputStream java.net.SocketException: Too many open files >> 11/01/06 22:12:48 INFO hdfs.DFSClient: Abandoning block >> blk_-2561354015640936272_1741138 >> 11/01/06 22:12:54 WARN hdfs.DFSClient: DataStreamer Exception: >> java.io.IOException: Too many open files >> at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method) >> at sun.nio.ch.EPollArrayWrapper.<init>(EPollArrayWrapper.java:69) >> at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:52) >> at >> sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) >> at >> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407) >> at >> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322) >> at >> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) >> at >> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) >> at >> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) >> at >> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) >> at java.io.DataOutputStream.write(DataOutputStream.java:90) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2314) >> 11/01/06 22:12:54 WARN hdfs.DFSClient: Error Recovery for block >> blk_2907917521214666486_1741138 bad datanode[0] >> 172.27.1.34:50010<http://172.27.1.34:50010> >> 11/01/06 22:12:54 WARN hdfs.DFSClient: Error Recovery for block >> blk_2907917521214666486_1741138 in pipeline >> 172.27.1.34:50010<http://172.27.1.34:50010>, >> 172.27.1.4:50010<http://172.27.1.4:50010>: bad datanode >> 172.27.1.34:50010<http://172.27.1.34:50010> >> Exception in thread "DataStreamer for file YYY block blk_29 >> 07917521214666486_1741138" java.lang.NullPointerException >> at >> org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:351) >> at >> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:313) >> at >> org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176) >> at org.apache.hadoop.ipc.Client.getConnection(Client.java:860) >> at org.apache.hadoop.ipc.Client.call(Client.java:720) >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) >> at $Proxy9.recoverBlock(Unknown Source) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2581) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2102) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2265) >> After this, the HiveServer stops working throwing various exceptions due >> to too many open files. >> This is from a trunk checkout from yesterday January 6th. >> Seems like we are leaking connections to datanodes on port 50010? >> Regards, >> Terje >