Re: bucketing in hive

Bejoy Ks Thu, 15 Dec 2011 04:14:20 -0800

Hi Ranjith
    I'm not aware of any Dynamic Bucketing in hive where as there is 
definitely  Dynamic Partitions available. Your partitions/sub partitions would 
be generated on the fly/dynamically based on the value of a particular column 
.The records with same values for that column would go into the same partition. 
But  Dynamic Partition load can't happen with a LOAD DATA statement as it 
requires running mapreduce job, You can utilize dynamic partitions in 2 steps 
for delimited files
- Load delimited file into a non partitioned table in hive using LOAD DATA


- Load data into destination table from the source table using INSERT OVERWRITE 
- here a MR job would be triggered that would do the job for you.

I have scribbled something down on the same, check whether it'd be useful for 
you.
http://kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html

Regards
Bejoy.K.S



________________________________
 From: "Raghunath, Ranjith" <ranjith.raghuna...@usaa.com>
To: "user@hive.apache.org" <user@hive.apache.org>; hive dev list 
<d...@hive.apache.org> 
Sent: Thursday, December 15, 2011 7:53 AM
Subject: bucketing in hive
 

 
Can one use bucketing in hive to emulate hash partitions on a database? Is 
there also a way to segment data into buckets dynamically based on values in 
the column. For example, 
 
Col1                       Col2
Apple                    1
Orange                 2
Apple                    2
Banana                 1
 
If the file above were inserted into a table with Col1 as the bucket column, 
can we dynamically allow all of the rows with “Apple” in one file and “Orange” 
in one file and so on. Is there a way to do this without specifying the bucket 
size to be 3. 
Thank you, 
Ranjith

Re: bucketing in hive

Reply via email to