conor garvey gelvin created HIVE-7501:
-----------------------------------------

             Summary: Automatic Aggregations in Partitioned Tables
                 Key: HIVE-7501
                 URL: https://issues.apache.org/jira/browse/HIVE-7501
             Project: Hive
          Issue Type: Improvement
          Components: Database/Schema
            Reporter: conor garvey gelvin


Aggregations are considered fundamental to OLAP systems as they provide a large 
speedup necessary for real world applications of databases. The number of 
aggregations such as count, sum, max among others, is proportional to the 
product of all aggregatable dimensions in a table, and therefore requires an 
unfeasible amount of time to compute in their entirety. Memory constraints are 
also a consideration to keep the subset small. Selecting the subset that is to 
be computed and saved for future use manually is also not entirely acceptable 
for modern systems as doing so is a trivial task that any user could do using a 
simple HiveQL command. 

An automatic way to compute the aggregations is therefore desirable. Proposal: 
In a partitioned table, results of built-in hive aggregated functions of a 
partition are saved in a table for each partition after the user asks once for 
that aggregated data. 

This provides a mechanism for overnight aggregations where the user can simply 
compute the aggregation once overnight and then in the day time use the 
aggregated data for data mining automatically. 

Critique, suggestions and development welcome.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to