Re: Hive for a large datawarehouse

2011-01-25 Thread Michael Roessler
Our experience with Hive, and my personal opinion, is that while it is a remarkable achievement to enable simple select and simple "group by" sql-type statements at scale, the HQL language remains too rudimentary to date to enable many of the select and ETL type SQL statements common in many wareho

Hive for a large datawarehouse

2011-01-25 Thread Sheetal Dolas
Hi, We are exploring hive for a very large data warehouse (Up to 2 PB data size) and would like to get some information 1. What are your experiences on using hive for large data warehouses 2. What is biggest hive implementation that you have seen 3. How is the query performance with peta bytes

Stopping Hive Metastore Service

2011-01-25 Thread Matias Silva
Hi I'm in the process of setting up an init script for the hive metastore. Whats the proper way to shutdown the hive metastore without killing pid? Or is killing the pid the only way? http://wiki.apache.org/hadoop/Hive/AdminManual/MetastoreAdmin Thanks, Matt Matias Silva [Sr. Software Engin

Re: Can compression be used with ColumnarSerDe ?

2011-01-25 Thread yongqiang he
@Zheng, right now no. Will open a new jira once got some numbers. On Tue, Jan 25, 2011 at 9:51 AM, Zheng Shao wrote: > Is there a jira about the design of the new file format? > > Sent from my iPhone > > On Jan 24, 2011, at 5:40 PM, yongqiang he wrote: > >> 4M is a good number based on a lot of

Re: Can compression be used with ColumnarSerDe ?

2011-01-25 Thread Zheng Shao
Is there a jira about the design of the new file format? Sent from my iPhone On Jan 24, 2011, at 5:40 PM, yongqiang he wrote: > 4M is a good number based on a lot of experiments. Increase the number > will reduce the file size, but the saving will increase very slow > after the block size goes

Re: Distinct in hive

2011-01-25 Thread Namit Jain
Is there skew in data ? You may want to set the parameter: hive.groupby.skewindata: to true. Thanks, -namit From: Guy Doulberg mailto:guy.doulb...@conduit.com>> Reply-To: mailto:user@hive.apache.org>> Date: Tue, 25 Jan 2011 08:25:36 -0800 To: "user@hive.apache.org"

Distinct in hive

2011-01-25 Thread Guy Doulberg
Hey, We made a query in hive, that calculates the number of distinct values in a group by. On small portion of data it worked well, however when we ran the query over large portion of data, we failed because OutOfMemory in some of the reducers. We wonder how is the distinct operator works in HI

Re: Is there a reason why this simple query would take a very long time?

2011-01-25 Thread Ajo Fod
One more thing to try: http://www.karmasphere.com/Karmasphere-Analyst/hive-queries-on-table-data.html#multi_group_by_inserts Look for this text: *"hive.map.aggr* controls how we do aggregations" Let me know if this hint helps. Cheers, Ajo. On Mon, Jan 24, 2011 at 2:01 PM, Jonathan Coveney wrot