Hi All, Does hive a Automated Database Desginer or has anyone tried building it ? Something which is equivalent to Vertica's DDB and Microsoft SQL server's Automated Partitioning Design in Parallel Databases.
References are : 1. Automated Partitioning Design in Parallel Database Systems ( https://cs.brown.edu/courses/cs227/archives/2012/papers/partitioning/p1137-nehme.pdf ) 2. DBDesigner: A Customizable Physical Design Tool for Vertica Analytic Database (http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6816725) Hive tuning tips mention need for pre-sorting tables on filter columns(for better predicate push down and joins), partitioning/clustering on join/group by columns, having a higher replication factor for dimension tables etc. However, I couldn't find any tool/library which suggests a physical layout given set of hive queries. Manually designing the physical layout doesn't scale specially the producers and consumers of tables (Data) are multiple different teams. There are conflicting requirements for optimizing different queries and globally optimal design can be very different from locally optimal design. If someone in community has worked on this or can give pointers, then it would be extremely useful for us. Thanks & Regards Umesh Prasad Team Lead, Flipkart