[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014813#comment-14014813 ]
Bikas Saha commented on HIVE-7158: ---------------------------------- To be clear, when auto reduce is enabled then will we set the stat-computed value and try to decrease it at runtime or will we use some configured max value and decrease that at runtime? If its the latter, then are there any downsides to always partitioning data with high cardinality (say 1000). Orthogonally, are there cases in the query plan where 2 branches of a query have a cardinality dependency. E.g. both are partitioned 100 ways and will later be joined without re-partitioning them. In those cases, auto-reduce cannot be turned on either side because it can change the cardinality differently on both sides. > Use Tez auto-parallelism in Hive > -------------------------------- > > Key: HIVE-7158 > URL: https://issues.apache.org/jira/browse/HIVE-7158 > Project: Hive > Issue Type: Bug > Reporter: Gunther Hagleitner > Assignee: Gunther Hagleitner > Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch > > > Tez can optionally sample data from a fraction of the tasks of a vertex and > use that information to choose the number of downstream tasks for any given > scatter gather edge. > Hive estimates the count of reducers by looking at stats and estimates for > each operator in the operator pipeline leading up to the reducer. However, if > this estimate turns out to be too large, Tez can reign in the resources used > to compute the reducer. > It does so by combining partitions of the upstream vertex. It cannot, > however, add reducers at this stage. > I'm proposing to let users specify whether they want to use auto-parallelism > or not. If they do there will be scaling factors to determine max and min > reducers Tez can choose from. We will then partition by max reducers, letting > Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)