And here is a full example

----------------------------------------------------------------------------------------------------
--             bash
----------------------------------------------------------------------------------------------------

mkdir -p t
mkdir -p t/20150122/dudu/cust1
mkdir -p t/cust2/aaa/2015/01/23/bbb/ccc/dudu/ddd
mkdir -p t/raj/20150204/cust1
mkdir -p t/raj/cust2/yyy/20150204/zzz
mkdir -p t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz

echo -e "1\n2\n3" > t/20150122/dudu/cust1/data.txt
echo -e "4"       > t/cust2/aaa/2015/01/23/bbb/ccc/dudu/ddd/data.txt
echo -e "5\n6"    > t/raj/20150204/cust1/data.txt
echo -e "7\n8\n9" > t/raj/cust2/yyy/20150204/zzz/data.txt
echo -e "10"      > 
t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz/data.txt

hdfs dfs -put t /tmp

----------------------------------------------------------------------------------------------------
--             hive
----------------------------------------------------------------------------------------------------


n  We’re creating the external table with the requested partition columns


create external table t (i int) partitioned by (user string,cust string,dt 
date) location '/tmp/t';


n  We’re choosing each partition values according the full path of the relevant 
directory

alter table t add partition (user='dudu',cust='cust1',dt=date '2015-01-22') 
location '/tmp/t/20150122/dudu/cust1';
alter table t add partition (user='dudu',cust='cust2',dt=date '2015-01-23') 
location '/tmp/t/cust2/aaa/2015/01/23/bbb/ccc/dudu/ddd';
alter table t add partition (user='raj' ,cust='cust1',dt=date '2015-02-04') 
location '/tmp/t/raj/20150204/cust1';
alter table t add partition (user='raj' ,cust='cust2',dt=date '2015-02-04') 
location '/tmp/t/raj/cust2/yyy/20150204/zzz';
alter table t add partition (user='raj' ,cust='cust3',dt=date '2015-02-04') 
location '/tmp/t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz';


n  The partitions’ values and their corresponding locations are all saved in 
the metastore

n  The metastore is being queried based on our query predicates. Returning the 
list of relevant partitions/locations


explain dependency select * from t where (cust like '%1' and dt < date 
'2015-02-01') or (user='raj' and substr(cust,-1) = 3) ;

{"input_partitions":[{"partitionName":"default@t@user=dudu/cust=cust1/dt=2015-01-22"},{"partitionName":"default@t@user=raj/cust=cust3/dt=2015-02-04"}],"input_tables":[{"tablename":"default@t","tabletype":"EXTERNAL_TABLE"}]}

select *,input__file__name from t where (cust like '%1' and dt < date 
'2015-02-01') or (user='raj' and substr(cust,-1) = 3) ;

1         dudu      cust1     2015-01-22         
hdfs://quickstart.cloudera:8020/tmp/t/20150122/dudu/cust1/data.txt
2         dudu      cust1     2015-01-22         
hdfs://quickstart.cloudera:8020/tmp/t/20150122/dudu/cust1/data.txt
3         dudu      cust1     2015-01-22         
hdfs://quickstart.cloudera:8020/tmp/t/20150122/dudu/cust1/data.txt
10        raj       cust3     2015-02-04          
hdfs://quickstart.cloudera:8020/tmp/t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz/data.txt



From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Monday, June 06, 2016 6:10 PM
To: user@hive.apache.org
Subject: RE: alter partitions on hive external table

… are just logical connections between certain values and specific directories …

From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Monday, June 06, 2016 6:07 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: alter partitions on hive external table

Hi Raj


1.       I don’t understand the reason for this change, can you please 
elaborate?



2.       External table is just an interface. Instructions for how to read 
existing data.

Partitions of external table are just a logical connections between certain 
values and a specific directories.

You can connect any set of values to any directory no matter what the 
directories structure is and then query the external table filtering on this 
values and by that eliminating the query only to the directories you are 
interested in.



3.       By all means, don’t duplicate data without a good reason (unless you 
don’t care about wasting storage, time, CPU etc.)

It seems to me that all you need to do is to retrieve a list of the directories 
and generate “alter table … add partition…” statements based on that.


Dudu

From: raj hive [mailto:raj.hiv...@gmail.com]
Sent: Monday, June 06, 2016 6:02 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: alter partitions on hive external table

Hi friends,
I have created partitions on hive external tables. partitions on 
datetime/userid/customerId.
now i have to change the order of the partitions for the existing data for all 
the dates.
order of the partition is custerid/userid/datetime.
Anyone can help me, how to alter the partitions for the existing table. Need a 
help to write a script to change the partions on existing data. almost 3 months 
data is there to modify as per new partition so changing each date is 
difficult. Any expert can help me.
Thanks
Raj

Reply via email to