[ https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355615#comment-17355615 ]
Abhinav Kumar commented on HIVE-951: ------------------------------------ Any update on this one? It saves a tons on data duplication as well. > Selectively include EXTERNAL TABLE source files via REGEX > --------------------------------------------------------- > > Key: HIVE-951 > URL: https://issues.apache.org/jira/browse/HIVE-951 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Carl Steinbach > Priority: Major > Attachments: HIVE-951.patch > > > CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular > expression. > CREATE EXTERNAL TABLE was designed to allow users to access data that exists > outside of Hive, and > currently makes the assumption that all of the files located under the > supplied path should be included > in the new table. Users frequently encounter directories containing multiple > datasets, or directories that contain data in heterogeneous schemas, and it's > often > impractical or impossible to adjust the layout of the directory to meet the > requirements of > CREATE EXTERNAL TABLE. A good example of this problem is creating an external > table based > on the contents of an S3 bucket. > One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE > as follows: > CREATE EXTERNAL TABLE > ... > LOCATION path [file_regex] > ... > For example: > {code:sql} > CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string ) > STORED AS TEXTFILE > LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$'; > {code} > Creates mytable1 which includes all files in s3:/my.bucket with a filename > matching 'folder/2009*.bz2' > {code:sql} > CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int ) > STORED AS TEXTFILE > LOCATION 'hdfs://data/' 'xyz.*2009????.bz2$'; > {code} > Creates mytable2 including all files matching 'xyz*2009????.bz2' located > under hdfs://data/ -- This message was sent by Atlassian Jira (v8.3.4#803005)