[ https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493746#comment-13493746 ]
Mark Grover commented on HIVE-933: ---------------------------------- Have we considered a case where the partition metadata conflicts with the table level metadata? For example, let's create a table output where we specify it will be clustered by 32 buckets during creation time. When we are populating a given partition of table output, we populate all in 1 bucket (a rather common mistake). At this point, the partition and table metadata are contradictory. For use in further queries, would we always choose the partition metadata over the table level metadata? > Infer bucketing/sorting properties > ---------------------------------- > > Key: HIVE-933 > URL: https://issues.apache.org/jira/browse/HIVE-933 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Namit Jain > Assignee: Kevin Wilfong > Attachments: HIVE-933.1.patch.txt, HIVE-933.2.patch.txt > > > This is a long-term plan, and may require major changes. > From the query, we can figure out the sorting/bucketing properties, and > change the metadata of the destination at that time. > However, this means that different partitions may have different metadata. > Currently, the query plan is same for all the > partitions of the table - we can do the following: > 1. In the first cut, have a simple approach where you take the union all > metadata, and create the most defensive plan. > 2. Enhance mapredWork() to include partition specific operator trees. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira