[ https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748613#comment-13748613 ]
Edward Capriolo commented on HIVE-951: -------------------------------------- This is being handle via another jira I think. This will be done via virtual column support. The file name will be a string that can be used as part of a where clause. > Selectively include EXTERNAL TABLE source files via REGEX > --------------------------------------------------------- > > Key: HIVE-951 > URL: https://issues.apache.org/jira/browse/HIVE-951 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Carl Steinbach > Assignee: Carl Steinbach > Attachments: HIVE-951.patch > > > CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular > expression. > CREATE EXTERNAL TABLE was designed to allow users to access data that exists > outside of Hive, and > currently makes the assumption that all of the files located under the > supplied path should be included > in the new table. Users frequently encounter directories containing multiple > datasets, or directories that contain data in heterogeneous schemas, and it's > often > impractical or impossible to adjust the layout of the directory to meet the > requirements of > CREATE EXTERNAL TABLE. A good example of this problem is creating an external > table based > on the contents of an S3 bucket. > One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE > as follows: > CREATE EXTERNAL TABLE > ... > LOCATION path [file_regex] > ... > For example: > {code:sql} > CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string ) > STORED AS TEXTFILE > LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$'; > {code} > Creates mytable1 which includes all files in s3:/my.bucket with a filename > matching 'folder/2009*.bz2' > {code:sql} > CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int ) > STORED AS TEXTFILE > LOCATION 'hdfs://data/' 'xyz.*2009????.bz2$'; > {code} > Creates mytable2 including all files matching 'xyz*2009????.bz2' located > under hdfs://data/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira