Re: Creating partition with __HIVE_DEFAULT_PARTITION__ value

2013-09-24 Thread Stephen Sprague
so msck repair table + dynamic partitioning semantics looks like it fits the bill for you. yeah, 300K partitions. That's getting up there on the scale of things with hive i'd say and close to over-partitioning. for archival purposes maybe older data doesn't need such a fine grained partition? s

Re: Creating partition with __HIVE_DEFAULT_PARTITION__ value

2013-09-24 Thread Ivan Kruglov
Hi everyone, Thank you for your answers. On 24.09.2013, at 0:36, Stephen Sprague wrote: > If its any help I've done this kind of thing frequently: > > 1. create the table on the new cluster. > > 2. distcp the data right into the hdfs directory where the table resides on > the new cluster -

Re: Creating partition with __HIVE_DEFAULT_PARTITION__ value

2013-09-23 Thread Stephen Sprague
If its any help I've done this kind of thing frequently: 1. create the table on the new cluster. 2. distcp the data right into the hdfs directory where the table resides on the new cluster - no temp storage required. 3. run this hive command: msck repair table ; -- this command will create y

Re: Creating partition with __HIVE_DEFAULT_PARTITION__ value

2013-09-23 Thread Nitin Pawar
If I understand correctly, this is what you are trying to do. you have a data center where the data is written on a hive table (Data center A) you have another data center where you want to take backup of table from data center A you are using dist-cp to transfer data from data center A to B if

Re: Creating partition with __HIVE_DEFAULT_PARTITION__ value

2013-09-23 Thread Edward Capriolo
Did you try ALTER TABLE table ADD IF NOT EXISTS PARTITION (partition=NULL); If that does not work you will need to create a dynamic partition type query that will create the dummy partition. File a jira if the above syntax does not work. There should be SOME way to create the default partition by

Creating partition with __HIVE_DEFAULT_PARTITION__ value

2013-09-23 Thread Ivan Kruglov
Hello to everyone, I'm working on the task of syncing data between two tables which have similar structure (read the same set of partitions). The tables are in different data centers and one table is a backup copy of another one. I'm trying to achieve this goal through distcp-ing data into targ