Thank you so much for the detailed info. On Tue, Jun 7, 2016 at 1:58 AM, Markovitz, Dudu <dmarkov...@paypal.com> wrote:
> And here is a full example > > > > > ---------------------------------------------------------------------------------------------------- > > -- bash > > > ---------------------------------------------------------------------------------------------------- > > > > mkdir -p t > > mkdir -p t/20150122/dudu/cust1 > > mkdir -p t/cust2/aaa/2015/01/23/bbb/ccc/dudu/ddd > > mkdir -p t/raj/20150204/cust1 > > mkdir -p t/raj/cust2/yyy/20150204/zzz > > mkdir -p t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz > > > > echo -e "1\n2\n3" > t/20150122/dudu/cust1/data.txt > > echo -e "4" > t/cust2/aaa/2015/01/23/bbb/ccc/dudu/ddd/data.txt > > echo -e "5\n6" > t/raj/20150204/cust1/data.txt > > echo -e "7\n8\n9" > t/raj/cust2/yyy/20150204/zzz/data.txt > > echo -e "10" > > t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz/data.txt > > > > hdfs dfs -put t /tmp > > > > > ---------------------------------------------------------------------------------------------------- > > -- hive > > > ---------------------------------------------------------------------------------------------------- > > > > n We’re creating the external table with the requested partition columns > > > > create external table t (i int) partitioned by (user string,cust string,dt > date) location '/tmp/t'; > > > > n We’re choosing each partition values according the full path of the > relevant directory > > > > alter table t add partition (user='dudu',cust='cust1',dt=date '2015-01-22') > location '/tmp/t/20150122/dudu/cust1'; > > alter table t add partition (user='dudu',cust='cust2',dt=date '2015-01-23') > location '/tmp/t/cust2/aaa/2015/01/23/bbb/ccc/dudu/ddd'; > > alter table t add partition (user='raj' ,cust='cust1',dt=date '2015-02-04') > location '/tmp/t/raj/20150204/cust1'; > > alter table t add partition (user='raj' ,cust='cust2',dt=date '2015-02-04') > location '/tmp/t/raj/cust2/yyy/20150204/zzz'; > > alter table t add partition (user='raj' ,cust='cust3',dt=date '2015-02-04') > location '/tmp/t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz'; > > > > n The partitions’ values and their corresponding locations are all saved > in the metastore > > n The metastore is being queried based on our query predicates. > Returning the list of relevant partitions/locations > > > > explain dependency select * from t where (cust like '%1' and dt < date > '2015-02-01') or (user='raj' and substr(cust,-1) = 3) ; > > > > {"input_partitions":[{"*partitionName*":"default@t@user=dudu/cust=cust1 > /dt=2015-01-22"},{"*partitionName*":"default@t@user=raj/cust=cust3/dt= > 2015-02-04"}],"input_tables":[{"tablename":"default@t > ","tabletype":"EXTERNAL_TABLE"}]} > > > > select *,input__file__name from t where (cust like '%1' and dt < date > '2015-02-01') or (user='raj' and substr(cust,-1) = 3) ; > > > > 1 dudu cust1 2015-01-22 > hdfs://quickstart.cloudera:8020/tmp/t/20150122/dudu/cust1/data.txt > > 2 dudu cust1 2015-01-22 > hdfs://quickstart.cloudera:8020/tmp/t/20150122/dudu/cust1/data.txt > > 3 dudu cust1 2015-01-22 > hdfs://quickstart.cloudera:8020/tmp/t/20150122/dudu/cust1/data.txt > > 10 raj cust3 2015-02-04 > hdfs://quickstart.cloudera:8020/tmp/t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz/data.txt > > > > > > > > *From:* Markovitz, Dudu [mailto:dmarkov...@paypal.com] > *Sent:* Monday, June 06, 2016 6:10 PM > > *To:* user@hive.apache.org > *Subject:* RE: alter partitions on hive external table > > > > … are just logical connections between certain values and specific > directories … > > > > *From:* Markovitz, Dudu [mailto:dmarkov...@paypal.com > <dmarkov...@paypal.com>] > *Sent:* Monday, June 06, 2016 6:07 PM > *To:* user@hive.apache.org > *Subject:* RE: alter partitions on hive external table > > > > Hi Raj > > > > 1. I don’t understand the reason for this change, can you please > elaborate? > > > > 2. External table is just an interface. Instructions for how to > read existing data. > > Partitions of external table are just a logical connections between > certain values and a specific directories. > > You can connect any set of values to any directory no matter what the > directories structure is and then query the external table filtering on > this values and by that eliminating the query only to the directories you > are interested in. > > > > 3. By all means, don’t duplicate data without a good reason (unless > you don’t care about wasting storage, time, CPU etc.) > > > > It seems to me that all you need to do is to retrieve a list of the > directories and generate “alter table … add partition…” statements based on > that. > > > > Dudu > > > > *From:* raj hive [mailto:raj.hiv...@gmail.com <raj.hiv...@gmail.com>] > *Sent:* Monday, June 06, 2016 6:02 AM > *To:* user@hive.apache.org > *Subject:* alter partitions on hive external table > > > > Hi friends, > > I have created partitions on hive external tables. partitions on > datetime/userid/customerId. > > now i have to change the order of the partitions for the existing data for > all the dates. > > order of the partition is custerid/userid/datetime. > > Anyone can help me, how to alter the partitions for the existing table. > Need a help to write a script to change the partions on existing data. > almost 3 months data is there to modify as per new partition so changing > each date is difficult. Any expert can help me. > > Thanks > > Raj >