Re: Please HELP: HIVE alter table add new partition to schema...

Mark Grover Tue, 19 Jun 2012 08:46:49 -0700

I agree.
It would be best if you wrote a script that iterates through each leaf-level 
partition of your existing table (logdate='A', source='B', datacenter='C', 
hostname='D')
and populate new leaf-level partitions in the new table (logdate='A', 
source='B', datacenter='C', hostname='D', loghour).

By leaf-level partition, I am referring to the partition column that appears 
the last in the list of partition columns. 

Technically, you can do dynamic partitioning in 1 query with something like:
FROM test_table src
INSERT OVERWRITE TABLE dest PARTITION(logdate, source, datacenter, hostname, 
loghour)
   SELECT ts, exec_time, domain_id, domain_name, logdate, source, datacenter, 
hostname, loghour;

However, as far as I understand, you wouldn't be able to take advantage of the 
existing partitioning in your source table (test_table). If you would like to 
take advantage of existing partitioning, you would have to issue a series of 
queries like this:
FROM test_table src
INSERT OVERWRITE TABLE dest PARTITION(logdate='A', source='B', datacenter='C', 
hostname='D', loghour)
   SELECT ts, exec_time, domain_id, domain_name, logdate, source, datacenter, 
hostname, loghour where logdate='A' and source='B' and datacenter='C' and 
hostname='D';

Just as a side note, whenever you think of your table partitioning, keep in 
mind to not overdo it. Creating more partitions could lead to a lot of small 
files on HDFS which reduces the performance of your Hadoop cluster. A couple 
people have talked about this small files problem:
http://arunxjacob.blogspot.ca/2011/04/hdfs-file-size-vs-allocation-other.html
http://www.cloudera.com/blog/2009/02/the-small-files-problem/
http://blog.rapleaf.com/dev/2008/11/20/give-me-liberty-or-give-me-death-but-dont-give-me-small-files/

In general, you would like your file sizes to be atleast of HDFS block size, 
most likely a small multiple of the block size. If you do find that you are 
running into the small files problem, there are other ways to get around like 
bucketing.

Good luck!
Mark

----- Original Message -----
From: "Edward Capriolo" <edlinuxg...@gmail.com>
To: user@hive.apache.org
Sent: Tuesday, June 19, 2012 11:12:48 AM
Subject: Re: Please HELP: HIVE alter table add new partition to schema...

You can not change the partition columns. I would use a dynamic
partition insert to select all the data from the original table into
the new table.

On 6/19/12, Tim Havens <timhav...@gmail.com> wrote:
> So...I have a table that has thousands of files, and Billions of rows
> related it.
>
> Lets make this a simple table:
>
> CREATE TABLE test_table (
>     ts BIGINT,
>     exec_time DOUBLE,
>     domain_id BIGINT,
>     domain_name STRING,
> )
> PARTITIONED BY (logdate STRING, source STRING, datacenter STRING,
> hostname STRING)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY '\001'
> MAP KEYS TERMINATED BY '\002'
> LINES TERMINATED BY '\n'
> STORED AS TextFile;
>
> So...what I need to do is ADD a partition to the PARTITIONED BY spec
> above....
>
> The partitioned by column I want to add is 'loghour STRING'.
>
> I can't seem to find any way to accomplish actually adding a NEW
> PARTITION COLUMN in the 'PARTITIONED BY' spec, without completely
> recreating and reloading the table.
>
> What's the correct way of adding to the partition schema and new
> column like 'loghour STRING'.
>
> I'm not trying to add an entry into the table DATA, I'm trying to add
> a completely new PARTITIONED BY Column...
>

Re: Please HELP: HIVE alter table add new partition to schema...

Reply via email to