Re: merge small orc files

2015-04-20 Thread Gopal Vijayaraghavan
Hi, >How to set the configuration hive-site.xml to automatically merge small >orc file (output from mapreduce job) in hive 0.14 ? Hive cannot add work-stages to a map-reduce job. Hive follows merge.mapfiles=true when Hive generates a plan, by adding more work to the plan as a conditional task.

Re: Table Lock Manager: ZooKeeper cluster

2015-04-20 Thread Xuefu Zhang
I'm not a zookeeper expert, but zookeeper is supposed to be characteristics of light-weight, high performance, and fast response. Unless you zookeeper is already overloaded, I don't see why you would need a separate zookeeper cluster just for Hive. There are a few zookeeper usages in Hive, the add

Re: merge small orc files

2015-04-20 Thread Xuefu Zhang
Also check hive.merge.size.per.task and hive.merge.smallfiles.avgsize. On Mon, Apr 20, 2015 at 8:29 AM, patcharee wrote: > Hi, > > How to set the configuration hive-site.xml to automatically merge small > orc file (output from mapreduce job) in hive 0.14 ? > > This is my current configuration> >

Using Hive as a file comparison and grep-ping tool

2015-04-20 Thread Sanjay Subramanian
hey guys As data wranglers and programmers we often need quick tools. One such tool I need almost everyday is one that greps a file based on contents of another file. One can write this in perl, python but since I am already using hadoop ecosystem extensively, I said why not do this in Hive ?  P

Clear up Hive scratch directory

2015-04-20 Thread Martin Benson
Hi, One of my users tried to run an HUGE join, which failed due to a lack of space in HDFS. This has resulted in a large amount of data remaining in the Hive scratch directory which I need to clear down. I've tried setting hive.start.cleanup.scratchdir to true and restarting Hive, but it didn't

Re: Orc file and Hive Optimiser

2015-04-20 Thread Alan Gates
Mich Talebzadeh April 19, 2015 at 12:32 Finally this is more of a speculative question. If we have ORC files that provide good functionality, is there any reason why one should deploy a columnar database such as Hbase or Cassandra If Hive can do the job as well?

merge small orc files

2015-04-20 Thread patcharee
Hi, How to set the configuration hive-site.xml to automatically merge small orc file (output from mapreduce job) in hive 0.14 ? This is my current configuration> hive.merge.mapfiles true hive.merge.mapredfiles true hive.merge.orcfile.st

Re: [Hive 0.13.1] - Explanation/confusion over "Fatal error occurred when node tried to create too many dynamic partitions" on small dataset with dynamic partitions

2015-04-20 Thread Daniel Harper
In our case we’ve chose 128 buckets, but that’s just an arbitrary figure we’ve chosen to get a good even distribution To fix the issue we were having with the small file we just updated the setting hive.exec.max.dynamic.partitions.pernode to 1, that way if we do run a tiny file (very rarely

Re: UDF cannot be found when the query is submitted via templeton

2015-04-20 Thread Eugene Koifman
can you give the complete REST call you are making to submit the query? From: Xiaoyong Zhu mailto:xiaoy...@microsoft.com>> Reply-To: "user@hive.apache.org" mailto:user@hive.apache.org>> Date: Sunday, April 19, 2015 at 8:23 PM To: "user@hive.apache.org