[ https://issues.apache.org/jira/browse/HIVE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829220#comment-13829220 ]
Harish Butani commented on HIVE-4956: ------------------------------------- I have a question: - would it be possible to expose this via a special Storage Handler. Whose parameters can capture the underlying tables - the advantage from a usage perspective is it is explicit in metadata and not embedded in a query; you do it once and all users get to use it. So something like: {noformat} CREATE TABLE T(key int, value string) STORED BY 'org.apache.hadoop.hive.UnionStorageHandler' WITH SERDEPROPERTIES ( "primary.table" = "T1", "secondary.tables" = "T2" ); -- the primary specifies the partitioning of this table. {noformat} > Allow multiple tables in from clause if all them have the same schema, but > can be partitioned differently > --------------------------------------------------------------------------------------------------------- > > Key: HIVE-4956 > URL: https://issues.apache.org/jira/browse/HIVE-4956 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Amareshwari Sriramadasu > Assignee: Amareshwari Sriramadasu > > We have a usecase where the table storage partitioning changes over time. > For ex: > we can have a table T1 which is partitioned by p1. But overtime, we want to > partition the table on p1 and p2 as well. The new table can be T2. So, if we > have to query table on partition p1, it will be a union query across two > table T1 and T2. Especially with aggregations like avg, it becomes costly > union query because we cannot make use of mapside aggregations and other > optimizations. > The proposal is to support queries of the following format : > select t.x, t.y, .... from T1,T2 t where t.p1='x' OR t.p1='y' ... > [groupby-clause] [having-clause] [orderby-clause] and so on. > Here we allow from clause as a comma separated list of tables with an alias > and alias will be used in the full query, and partition pruning will happen > on the actual tables to pick up the right paths. This will work because the > difference is only on picking up the input paths and whole operator tree does > not change. If this sounds a good usecase, I can put up the changes required > to support the same. -- This message was sent by Atlassian JIRA (v6.1#6144)