date:20110125

Re: Hive for a large datawarehouse

2011-01-25 Thread Michael Roessler

Our experience with Hive, and my personal opinion, is that while it is a remarkable achievement to enable simple select and simple "group by" sql-type statements at scale, the HQL language remains too rudimentary to date to enable many of the select and ETL type SQL statements common in many wareho

Hive for a large datawarehouse

2011-01-25 Thread Sheetal Dolas

Hi, We are exploring hive for a very large data warehouse (Up to 2 PB data size) and would like to get some information 1. What are your experiences on using hive for large data warehouses 2. What is biggest hive implementation that you have seen 3. How is the query performance with peta bytes

Stopping Hive Metastore Service

2011-01-25 Thread Matias Silva

Hi I'm in the process of setting up an init script for the hive metastore. Whats the proper way to shutdown the hive metastore without killing pid? Or is killing the pid the only way? http://wiki.apache.org/hadoop/Hive/AdminManual/MetastoreAdmin Thanks, Matt Matias Silva [Sr. Software Engin

Re: Can compression be used with ColumnarSerDe ?

2011-01-25 Thread yongqiang he

@Zheng, right now no. Will open a new jira once got some numbers. On Tue, Jan 25, 2011 at 9:51 AM, Zheng Shao wrote: > Is there a jira about the design of the new file format? > > Sent from my iPhone > > On Jan 24, 2011, at 5:40 PM, yongqiang he wrote: > >> 4M is a good number based on a lot of

Re: Can compression be used with ColumnarSerDe ?

2011-01-25 Thread Zheng Shao

Is there a jira about the design of the new file format? Sent from my iPhone On Jan 24, 2011, at 5:40 PM, yongqiang he wrote: > 4M is a good number based on a lot of experiments. Increase the number > will reduce the file size, but the saving will increase very slow > after the block size goes

Re: Distinct in hive

2011-01-25 Thread Namit Jain

Is there skew in data ? You may want to set the parameter: hive.groupby.skewindata: to true. Thanks, -namit From: Guy Doulberg mailto:guy.doulb...@conduit.com>> Reply-To: mailto:user@hive.apache.org>> Date: Tue, 25 Jan 2011 08:25:36 -0800 To: "user@hive.apache.org"

Distinct in hive

2011-01-25 Thread Guy Doulberg

Hey, We made a query in hive, that calculates the number of distinct values in a group by. On small portion of data it worked well, however when we ran the query over large portion of data, we failed because OutOfMemory in some of the reducers. We wonder how is the distinct operator works in HI

Re: Is there a reason why this simple query would take a very long time?

2011-01-25 Thread Ajo Fod

One more thing to try: http://www.karmasphere.com/Karmasphere-Analyst/hive-queries-on-table-data.html#multi_group_by_inserts Look for this text: *"hive.map.aggr* controls how we do aggregations" Let me know if this hint helps. Cheers, Ajo. On Mon, Jan 24, 2011 at 2:01 PM, Jonathan Coveney wrot

Re: Hive for a large datawarehouse

Hive for a large datawarehouse

Stopping Hive Metastore Service

Re: Can compression be used with ColumnarSerDe ?

Re: Can compression be used with ColumnarSerDe ?

Re: Distinct in hive

Distinct in hive

Re: Is there a reason why this simple query would take a very long time?

8 matches

Site Navigation

Mail list logo

Footer information