[jira] [Commented] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently

Harish Butani (JIRA) Thu, 21 Nov 2013 11:01:30 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829220#comment-13829220
 ]


Harish Butani commented on HIVE-4956:
-------------------------------------

I have a question:
- would it be possible to expose this via a special Storage Handler. Whose 
parameters can capture the underlying tables
- the advantage from a usage perspective is it is explicit in metadata and not 
embedded in a query; you do it once and all users get to use it.

So something like:
{noformat}
CREATE TABLE T(key int, value string) 
STORED BY 'org.apache.hadoop.hive.UnionStorageHandler'
WITH SERDEPROPERTIES (
"primary.table" = "T1",
"secondary.tables" = "T2"
);
-- the primary specifies the partitioning of this table.
{noformat}

> Allow multiple tables in from clause if all them have the same schema, but 
> can be partitioned differently
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4956
>                 URL: https://issues.apache.org/jira/browse/HIVE-4956
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
> We have a usecase where the table storage partitioning changes over time.
> For ex:
>  we can have a table T1 which is partitioned by p1. But overtime, we want to 
> partition the table on p1 and p2 as well. The new table can be T2. So, if we 
> have to query table on partition p1, it will be a union query across two 
> table T1 and T2. Especially with aggregations like avg, it becomes costly 
> union query because we cannot make use of mapside aggregations and other 
> optimizations.
> The proposal is to support queries of the following format :
> select t.x, t.y, .... from T1,T2 t where t.p1='x' OR t.p1='y' ... 
> [groupby-clause] [having-clause] [orderby-clause] and so on.
> Here we allow from clause as a comma separated list of tables with an alias 
> and alias will be used in the full query, and partition pruning will happen 
> on the actual tables to pick up the right paths. This will work because the 
> difference is only on picking up the input paths and whole operator tree does 
> not change. If this sounds a good usecase, I can put up the changes required 
> to support the same.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently

Reply via email to