Thank you so much for the detailed info.

On Tue, Jun 7, 2016 at 1:58 AM, Markovitz, Dudu <dmarkov...@paypal.com>
wrote:

> And here is a full example
>
>
>
>
> ----------------------------------------------------------------------------------------------------
>
> --             bash
>
>
> ----------------------------------------------------------------------------------------------------
>
>
>
> mkdir -p t
>
> mkdir -p t/20150122/dudu/cust1
>
> mkdir -p t/cust2/aaa/2015/01/23/bbb/ccc/dudu/ddd
>
> mkdir -p t/raj/20150204/cust1
>
> mkdir -p t/raj/cust2/yyy/20150204/zzz
>
> mkdir -p t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz
>
>
>
> echo -e "1\n2\n3" > t/20150122/dudu/cust1/data.txt
>
> echo -e "4"       > t/cust2/aaa/2015/01/23/bbb/ccc/dudu/ddd/data.txt
>
> echo -e "5\n6"    > t/raj/20150204/cust1/data.txt
>
> echo -e "7\n8\n9" > t/raj/cust2/yyy/20150204/zzz/data.txt
>
> echo -e "10"      >
> t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz/data.txt
>
>
>
> hdfs dfs -put t /tmp
>
>
>
>
> ----------------------------------------------------------------------------------------------------
>
> --             hive
>
>
> ----------------------------------------------------------------------------------------------------
>
>
>
> n  We’re creating the external table with the requested partition columns
>
>
>
> create external table t (i int) partitioned by (user string,cust string,dt
> date) location '/tmp/t';
>
>
>
> n  We’re choosing each partition values according the full path of the
> relevant directory
>
>
>
> alter table t add partition (user='dudu',cust='cust1',dt=date '2015-01-22')
> location '/tmp/t/20150122/dudu/cust1';
>
> alter table t add partition (user='dudu',cust='cust2',dt=date '2015-01-23')
> location '/tmp/t/cust2/aaa/2015/01/23/bbb/ccc/dudu/ddd';
>
> alter table t add partition (user='raj' ,cust='cust1',dt=date '2015-02-04')
> location '/tmp/t/raj/20150204/cust1';
>
> alter table t add partition (user='raj' ,cust='cust2',dt=date '2015-02-04')
> location '/tmp/t/raj/cust2/yyy/20150204/zzz';
>
> alter table t add partition (user='raj' ,cust='cust3',dt=date '2015-02-04')
> location '/tmp/t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz';
>
>
>
> n  The partitions’ values and their corresponding locations are all saved
> in the metastore
>
> n  The metastore is being queried based on our query predicates.
> Returning the list of relevant partitions/locations
>
>
>
> explain dependency select * from t where (cust like '%1' and dt < date
> '2015-02-01') or (user='raj' and substr(cust,-1) = 3) ;
>
>
>
> {"input_partitions":[{"*partitionName*":"default@t@user=dudu/cust=cust1
> /dt=2015-01-22"},{"*partitionName*":"default@t@user=raj/cust=cust3/dt=
> 2015-02-04"}],"input_tables":[{"tablename":"default@t
> ","tabletype":"EXTERNAL_TABLE"}]}
>
>
>
> select *,input__file__name from t where (cust like '%1' and dt < date
> '2015-02-01') or (user='raj' and substr(cust,-1) = 3) ;
>
>
>
> 1         dudu      cust1     2015-01-22
> hdfs://quickstart.cloudera:8020/tmp/t/20150122/dudu/cust1/data.txt
>
> 2         dudu      cust1     2015-01-22
> hdfs://quickstart.cloudera:8020/tmp/t/20150122/dudu/cust1/data.txt
>
> 3         dudu      cust1     2015-01-22
> hdfs://quickstart.cloudera:8020/tmp/t/20150122/dudu/cust1/data.txt
>
> 10        raj       cust3     2015-02-04
> hdfs://quickstart.cloudera:8020/tmp/t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz/data.txt
>
>
>
>
>
>
>
> *From:* Markovitz, Dudu [mailto:dmarkov...@paypal.com]
> *Sent:* Monday, June 06, 2016 6:10 PM
>
> *To:* user@hive.apache.org
> *Subject:* RE: alter partitions on hive external table
>
>
>
> … are just logical connections between certain values and specific
> directories …
>
>
>
> *From:* Markovitz, Dudu [mailto:dmarkov...@paypal.com
> <dmarkov...@paypal.com>]
> *Sent:* Monday, June 06, 2016 6:07 PM
> *To:* user@hive.apache.org
> *Subject:* RE: alter partitions on hive external table
>
>
>
> Hi Raj
>
>
>
> 1.       I don’t understand the reason for this change, can you please
> elaborate?
>
>
>
> 2.       External table is just an interface. Instructions for how to
> read existing data.
>
> Partitions of external table are just a logical connections between
> certain values and a specific directories.
>
> You can connect any set of values to any directory no matter what the
> directories structure is and then query the external table filtering on
> this values and by that eliminating the query only to the directories you
> are interested in.
>
>
>
> 3.       By all means, don’t duplicate data without a good reason (unless
> you don’t care about wasting storage, time, CPU etc.)
>
>
>
> It seems to me that all you need to do is to retrieve a list of the
> directories and generate “alter table … add partition…” statements based on
> that.
>
>
>
> Dudu
>
>
>
> *From:* raj hive [mailto:raj.hiv...@gmail.com <raj.hiv...@gmail.com>]
> *Sent:* Monday, June 06, 2016 6:02 AM
> *To:* user@hive.apache.org
> *Subject:* alter partitions on hive external table
>
>
>
> Hi friends,
>
> I have created partitions on hive external tables. partitions on
> datetime/userid/customerId.
>
> now i have to change the order of the partitions for the existing data for
> all the dates.
>
> order of the partition is custerid/userid/datetime.
>
> Anyone can help me, how to alter the partitions for the existing table.
> Need a help to write a script to change the partions on existing data.
> almost 3 months data is there to modify as per new partition so changing
> each date is difficult. Any expert can help me.
>
> Thanks
>
> Raj
>

Reply via email to