Ya I very much agree with you on those lines. Using the basic stuff would literally run into memory issues with large datasets. I had some of those resolved by using the DISTRIBUTE BY clause and so. In short a little work around over your hive queries could help you out in some cases. Regards Bejoy K S
-----Original Message----- From: hadoopman <hadoop...@gmail.com> Date: Sun, 14 Aug 2011 08:57:12 To: <user@hive.apache.org> Reply-To: user@hive.apache.org Subject: Re: how to load data to partitioned table Something else I've noticed is when loading LOTS of historical data, if you can try to say load a month of data at a time, try to just load THAT month of data and only that month. I've been able to load several years of data (depending on the data) at a single load however there have been times when loading a large dataset that I would run into memory issues during the reduce phase (usually during shuffle/sort). Things from out of memory to stack overflow messages (I've compiled a list of the more fun ones). Then I noticed that only loading data from say a single month loaded quickly and without the memory headaches during the reduce. Something to keep in mind and it works great! On 08/12/2011 07:58 AM, bejoy...@yahoo.com wrote: > Hi Daniel > Just having a look at your requirement , to load data into a partition > based hive table from any input file the most hassle free approach > would be. > 1. Load the data into a non partitioned table that shares similar > structure as the target table. > 2. Populate the target table with the data from non partitioned one > using hive dynamic partition > approach. > With Dynamic partitions you don't need to manually identify the data > partitions and distribute data accordingly. > > A similar implementation is described in the blog post > www.kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html > > Hope it helps > > Regards > Bejoy K S > > ------------------------------------------------------------------------ > *From: * Vikas Srivastava <vikas.srivast...@one97.net> > *Date: *Fri, 12 Aug 2011 17:31:28 +0530 > *To: *<user@hive.apache.org> > *ReplyTo: * user@hive.apache.org > *Subject: *Re: how to load data to partitioned table > > Hey , > > Simpley you have run query like this > > FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key) > SELECT * > > > Regards > Vikas Srivastava > > > 2011/8/12 Daniel,Wu <hadoop...@163.com <mailto:hadoop...@163.com>> > > suppose the table is partitioned by period_key, and the csv file > also has a column named as period_key. The csv file contains > multiple days of data, how can we load it in the the table? > > I think of an workaround by first load the data into a > non-partition table, and then insert the data from non-partition > table to the partition table. > > hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp; > FAILED: Error in semantic analysis: need to specify partition > columns because the destination table is partitioned. > > > However it doesn't work also. please help. > > > > > > -- > With Regards > Vikas Srivastava > > DWH & Analytics Team > Mob:+91 9560885900 > One97 | Let's get talking ! >