Ya I very much agree with you on those lines. Using the basic stuff would 
literally run into memory issues  with large datasets. I had some of those 
resolved by using the DISTRIBUTE BY clause and so. In short a little work 
around over your hive queries could help you out in some cases.
Regards
Bejoy K S

-----Original Message-----
From: hadoopman <hadoop...@gmail.com>
Date: Sun, 14 Aug 2011 08:57:12 
To: <user@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Re: how to load data to partitioned table

Something else I've noticed is when loading LOTS of historical data, if 
you can try to say load a month of data at a time, try to just load THAT 
month of data and only that month.  I've been able to load several years 
of data (depending on the data) at a single load however there have been 
times when loading a large dataset that I would run into memory issues 
during the reduce phase (usually during shuffle/sort).  Things from out 
of memory to stack overflow messages (I've compiled a list of the more 
fun ones).

Then I noticed that only loading data from say a single month loaded 
quickly and without the memory headaches during the reduce.

Something to keep in mind and it works great!



On 08/12/2011 07:58 AM, bejoy...@yahoo.com wrote:
> Hi Daniel
> Just having a look at your requirement , to load data into a partition 
> based hive table from any input file the most hassle free approach 
> would be.
> 1. Load the data into a non partitioned table that shares similar 
> structure as the target table.
> 2. Populate the target table with the data from non partitioned one 
> using hive dynamic partition
> approach.
> With Dynamic partitions you don't need to manually identify the data 
> partitions and distribute data accordingly.
>
> A similar implementation is described in the blog post
> www.kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html
>
> Hope it helps
>
> Regards
> Bejoy K S
>
> ------------------------------------------------------------------------
> *From: * Vikas Srivastava <vikas.srivast...@one97.net>
> *Date: *Fri, 12 Aug 2011 17:31:28 +0530
> *To: *<user@hive.apache.org>
> *ReplyTo: * user@hive.apache.org
> *Subject: *Re: how to load data to partitioned table
>
> Hey ,
>
> Simpley you have run query like this
>
> FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key) 
> SELECT *
>
>
> Regards
> Vikas Srivastava
>
>
> 2011/8/12 Daniel,Wu <hadoop...@163.com <mailto:hadoop...@163.com>>
>
>       suppose the table is partitioned by period_key, and the csv file
>     also has a column named as period_key. The csv file contains
>     multiple days of data, how can we load it in the the table?
>
>     I think of an workaround by first load the data into a
>     non-partition table, and then insert the data from non-partition
>     table to the partition table.
>
>     hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
>     FAILED: Error in semantic analysis: need to specify partition
>     columns because the destination table is partitioned.
>
>
>     However it doesn't work also. please help.
>
>
>
>
>
> -- 
> With Regards
> Vikas Srivastava
>
> DWH & Analytics Team
> Mob:+91 9560885900
> One97 | Let's get talking !
>


Reply via email to