Re: Hive dynamic partitions generate multiple files

2014-01-28 Thread Cosmin Cătălin Sanda
Hi Andre, The reason is that I want those partitions to go into other queries. If the individual files are only a few MB than the performance will be sub-optimal. As far as I understood, the individual files need to be at least around 140MB for the Maps to work properly. -

Re: Using Hive generated SeqeunceFiles and RC files with Java MapReduce and PIG

2014-01-28 Thread Thilina Gunarathne
Thanks for the information Edward. When you use the default Serde (lazySerde) and sequence files hive writes a > SequenceFile(create table x stored as sequence file) , the key is null > and hive serializes all the columns into a Text Writable that is easy for > other tools to read. > Does thi

Is it possible to run Hive 0.12 in local mode without Hadoop binary?

2014-01-28 Thread moon soo Lee
Hi, cool guys. I'm doing Hive GUI opensource project called zeppelin. http://zeppelin-project.org In this project, i execute hive in local mode when GUI application want to run in local mode. Everything works really well. However, Hive local mode still need location of HADOOP_HOME and looking for

Re: Using Hive generated SeqeunceFiles and RC files with Java MapReduce and PIG

2014-01-28 Thread Edward Capriolo
When you use the default Serde (lazySerde) and sequence files hive writes a SequenceFile(create table x stored as sequence file) , the key is null and hive serializes all the columns into a Text Writable that is easy for other tools to read. Hive does not dictate the input format or the output

Re: Hive dynamic partitions generate multiple files

2014-01-28 Thread Andre Araujo
Why do you need exactly one file? This is transparent to Hive and it should treat it seamlessly. Unless you have external requirements (reading files from somewhere else) it shouldn't matter. HDFS support to file append is not a solid standard afaik, and will depend on the distribution and version

Using Hive generated SeqeunceFiles and RC files with Java MapReduce and PIG

2014-01-28 Thread Thilina Gunarathne
Hi, We have a requirement to store a large data set (more than 5TB) mapped to a Hive table. This Hive table would be populated (and appended periodically) using a Hive query from another Hive table. In addition to the Hive queries, we need to be able to run Java MapReduce and preferably Pig jobs as

Re: Hive dynamic partitions generate multiple files

2014-01-28 Thread Cosmin Cătălin Sanda
Hi Andre, So the thing is like this: the first time the query runs, it generates one file per dynamic partition, The next time the query runs and it needs to write to the same partition, it will generate another file instead of merging with the existing one. Eg: 1.The partitioned S3 path looks li

Re: Hive dynamic partitions generate multiple files

2014-01-28 Thread Andre Araujo
Hi, Cosmin, Have you tried using DISTRIBUTE BY to distribute the query's data by the partitioning columns? That way all the data for each partition should be sent to the same reducer and should be written to a single file in each partition, I think. If your data is being distributed by a differen

Hive dynamic partitions generate multiple files

2014-01-28 Thread Cosmin Cătălin Sanda
Hi, I have a number of Hive jobs that run during a day. Each individual job is outputting data to Amazon S3. The Hive jobs use dynamic partitioning. The problem is that when different jobs need to write to the same dynamic partition, they will each generate one file. What I would like is for th

Re: Building Hive

2014-01-28 Thread Stephen Sprague
you => useful. the rest of us schmucks => enlightened. seems like a fair trade-off. :) On Tue, Jan 28, 2014 at 2:26 PM, Lefty Leverenz wrote: > > ... probably should have known that already. > > Oh sure, assuming you spend your spare time reading release notes or > browsing the Hive wiki. Inst

Re: Building Hive

2014-01-28 Thread Lefty Leverenz
> ... probably should have known that already. Oh sure, assuming you spend your spare time reading release notes or browsing the Hive wiki. Instead you've given me a chance to publicize Hive Schema Tool which makes me feel useful, so thanks. Now all you need is answers to your other questions...

Re: Issue with Hive and table with lots of column

2014-01-28 Thread Stephen Sprague
there's always a use case out there that stretches the imagination isn't there? gotta love it. first things first. can you share the error message? the hive version? and the number of nodes in your cluster? then a couple of things come to my mind. Might you consider pivoting the data such th

RE: Building Hive

2014-01-28 Thread Peter Marron
Ah, thank you. I think that I probably should have known that already. Z From: Lefty Leverenz [mailto:leftylever...@gmail.com] Sent: 28 January 2014 11:05 To: user@hive.apache.org Subject: Re: Building Hive I can only answer your last question about rebuilding the metastore: a new Hive schema t

Re: Building Hive

2014-01-28 Thread Lefty Leverenz
I can only answer your last question about rebuilding the metastore: a new Hive schema tool can do that for you, as described in the wiki here . This tool can be used to initialize the metastore sc

Building Hive

2014-01-28 Thread Peter Marron
Hi, So I can see from http://hive.apache.org/downloads.html that I can download versions 11 and 12 and they will work with Hadoop 1.0.4 which I am currently using. So if I want to start stepping through the source, to look into my problem with indexes, should I try and build version 11 or 12 with