so msck repair table + dynamic partitioning semantics looks like it fits
the bill for you.
yeah, 300K partitions. That's getting up there on the scale of things with
hive i'd say and close to over-partitioning. for archival purposes maybe
older data doesn't need such a fine grained partition? s
Hi everyone,
Thank you for your answers.
On 24.09.2013, at 0:36, Stephen Sprague wrote:
> If its any help I've done this kind of thing frequently:
>
> 1. create the table on the new cluster.
>
> 2. distcp the data right into the hdfs directory where the table resides on
> the new cluster -
If its any help I've done this kind of thing frequently:
1. create the table on the new cluster.
2. distcp the data right into the hdfs directory where the table resides on
the new cluster - no temp storage required.
3. run this hive command: msck repair table ; -- this command
will create y
If I understand correctly, this is what you are trying to do.
you have a data center where the data is written on a hive table (Data
center A)
you have another data center where you want to take backup of table from
data center A
you are using dist-cp to transfer data from data center A to B
if
Did you try ALTER TABLE table ADD IF NOT EXISTS PARTITION (partition=NULL);
If that does not work you will need to create a dynamic partition type
query that will create the dummy partition. File a jira if the above syntax
does not work. There should be SOME way to create the default partition by
Hello to everyone,
I'm working on the task of syncing data between two tables which have similar
structure (read the same set of partitions). The tables are in different data
centers and one table is a backup copy of another one. I'm trying to achieve
this goal through distcp-ing data into targ