From what I understood it will then be possible to tell hive it's loading a csv 
while you are in fact loading something else (sequence files forinstance). I 
don't think that's a big deal.

Op 7 jan. 2011 om 11:41 heeft "Terje Marthinussen" 
<tmarthinus...@gmail.com<mailto:tmarthinus...@gmail.com>> het volgende 
geschreven:

Seems like this works for me too!

That probably saved me for a bunch of hours tracing this down through hive and 
hadoop

Do you know what the side effect of setting this to false would be?.

Thanks!
Terje

On Fri, Jan 7, 2011 at 4:39 PM, Bennie Schut 
<<mailto:bsc...@ebuddy.com>bsc...@ebuddy.com<mailto:bsc...@ebuddy.com>> wrote:
In the past I ran into a similar problem which was actually caused by a bug in 
hadoop. Someone was nice enough to come up with a workaround for this. Perhaps 
you are running into a similar problem. I also had this problem when calling 
lots of “load file” commands. After adding this to the hive-site.xml we never 
had this problem again:

  <!-- workaround for connection leak problem fixed in HADOOP-5476 but only 
commited to hadoop 0.21.0 -->
  <property>
    <name>hive.fileformat.check</name>
    <value>false</value>
  </property>

________________________________
From: Terje Marthinussen 
[mailto:<mailto:tmarthinus...@gmail.com>tmarthinus...@gmail.com<mailto:tmarthinus...@gmail.com>]
Sent: Friday, January 07, 2011 4:14 AM
To: <mailto:user@hive.apache.org> 
user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Too many open files

No, the problem is connections to datanodes on port 50010.

Terje
On Fri, Jan 7, 2011 at 11:46 AM, Shrijeet Paliwal 
<<mailto:shrij...@rocketfuel.com>shrij...@rocketfuel.com<mailto:shrij...@rocketfuel.com>>
 wrote:
You mentioned that you got the code from trunk so fair to assume you
are not hitting <https://issues.apache.org/jira/browse/HIVE-1508> 
https://issues.apache.org/jira/browse/HIVE-1508
Worth checking still. Are all the open files -  hive history files
(they look like hive_job_log*.txt) ? Like Viral suggested you can
check that by monitoring open files.

-Shrijeet

On Thu, Jan 6, 2011 at 6:15 PM, Viral Bajaria 
<<mailto:viral.baja...@gmail.com>viral.baja...@gmail.com<mailto:viral.baja...@gmail.com>>
 wrote:
> Hi Terje,
>
> I have asked about this issue in an earlier thread but never got any
> response.
>
> I get this exception when I am using Hive over Thrift and submitting 1000s
> of LOAD FILE commands. If you actively monitor the open file count of the
> user under which I run the hive instance, it keeps on creeping yup for every
> LOAD FILE command sent to it.
>
> I have a temporary fix by increasing the # of open file(s) to 60000+ and
> then periodically restarting my thrift server (once every 2 days) to release
> the open file handlers.
>
> I would appreciate some feedback. (trying to find my earlier email)
>
> Thanks,
> Viral
>
> On Thu, Jan 6, 2011 at 4:57 PM, Terje Marthinussen 
> <<mailto:tmarthinus...@gmail.com>tmarthinus...@gmail.com<mailto:tmarthinus...@gmail.com>>
> wrote:
>>
>> Hi,
>> While loading some 10k+ .gz files through HiveServer with LOAD FILE etc.
>> etc.
>> 11/01/06 22:12:42 INFO exec.CopyTask: Copying data from file:XXX.gz to
>> hdfs://YYY
>> 11/01/06 22:12:42 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream java.net.SocketException: Too many open files
>> 11/01/06 22:12:42 INFO hdfs.DFSClient: Abandoning block
>> blk_8251287732961496983_1741138
>> 11/01/06 22:12:48 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream java.net.SocketException: Too many open files
>> 11/01/06 22:12:48 INFO hdfs.DFSClient: Abandoning block
>> blk_-2561354015640936272_1741138
>> 11/01/06 22:12:54 WARN hdfs.DFSClient: DataStreamer Exception:
>> java.io.IOException: Too many open files
>>         at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
>>         at sun.nio.ch.EPollArrayWrapper.<init>(EPollArrayWrapper.java:69)
>>         at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:52)
>>         at
>> sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407)
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322)
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
>>         at
>> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>>         at
>> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>>         at
>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>>         at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>         at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2314)
>> 11/01/06 22:12:54 WARN hdfs.DFSClient: Error Recovery for block
>> blk_2907917521214666486_1741138 bad datanode[0] 
>> 172.27.1.34:50010<http://172.27.1.34:50010>
>> 11/01/06 22:12:54 WARN hdfs.DFSClient: Error Recovery for block
>> blk_2907917521214666486_1741138 in pipeline 
>> 172.27.1.34:50010<http://172.27.1.34:50010>,
>> 172.27.1.4:50010<http://172.27.1.4:50010>: bad datanode 
>> 172.27.1.34:50010<http://172.27.1.34:50010>
>> Exception in thread "DataStreamer for file YYY block blk_29
>> 07917521214666486_1741138" java.lang.NullPointerException
>>         at
>> org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:351)
>>         at
>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:313)
>>         at
>> org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
>>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:720)
>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>         at $Proxy9.recoverBlock(Unknown Source)
>>         at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2581)
>>         at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2102)
>>         at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2265)
>> After this, the HiveServer stops working throwing various exceptions due
>> to too many open files.
>> This is from a trunk checkout from yesterday January 6th.
>> Seems like we are leaking connections to datanodes on port 50010?
>> Regards,
>> Terje
>


Reply via email to