[ https://issues.apache.org/jira/browse/HIVE-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575784#comment-16575784 ]
Ilkka Peltola commented on HIVE-6589: ------------------------------------- The U-SQL language has [an example implementation|https://msdn.microsoft.com/en-us/azure/data-lake-analytics/u-sql/input-files-u-sql#inputfilesetpath] of this. It might be a reference implementation if that makes it easier to start on this? I've used the U-SQL (or rather SCOPE, which was before) and really miss that functionality in Hive. It makes loading data into external tables and partitioning them a great deal easier. So often data is in the format folder/year/month/day/file.ext. > Automatically add partitions for external tables > ------------------------------------------------ > > Key: HIVE-6589 > URL: https://issues.apache.org/jira/browse/HIVE-6589 > Project: Hive > Issue Type: New Feature > Affects Versions: 0.14.0 > Reporter: Ken Dallmeyer > Assignee: Dharmendra Pratap Singh > Priority: Major > > I have a data stream being loaded into Hadoop via Flume. It loads into a date > partition folder in HDFS. The path looks like this: > {code}/flume/my_data/YYYY/MM/DD/HH > /flume/my_data/2014/03/02/01 > /flume/my_data/2014/03/02/02 > /flume/my_data/2014/03/02/03{code} > On top of it I create an EXTERNAL hive table to do querying. As of now, I > have to manually add partitions. What I want is for EXTERNAL tables, Hive > should "discover" those partitions. Additionally I would like to specify a > partition pattern so that when I query Hive will know to use the partition > pattern to find the HDFS folder. > So something like this: > {code}CREATE EXTERNAL TABLE my_data ( > col1 STRING, > col2 INT > ) > PARTITIONED BY ( > dt STRING, > hour STRING > ) > LOCATION > '/flume/mydata' > TBLPROPERTIES ( > 'hive.partition.spec' = 'dt=$Y-$M-$D, hour=$H', > 'hive.partition.spec.location' = '$Y/$M/$D/$H', > ); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)