[ https://issues.apache.org/jira/browse/HIVE-24485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278519#comment-17278519 ]
okumin commented on HIVE-24485: ------------------------------- I got a notification to stale the PR. [~gopalv] or anyone familiar with Hive on Tez: Could you please take a look when you have a chance? > Make the slow-start behavior tunable > ------------------------------------ > > Key: HIVE-24485 > URL: https://issues.apache.org/jira/browse/HIVE-24485 > Project: Hive > Issue Type: Improvement > Components: Hive, Tez > Affects Versions: 3.1.2, 4.0.0 > Reporter: okumin > Assignee: okumin > Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > This ticket would enable users to configure the timing of slow-start with > `tez.shuffle-vertex-manager.min-src-fraction` and > `tez.shuffle-vertex-manager.max-src-fraction`. > Hive on Tez currently doesn't honor these parameters and ShuffleVertexManager > always uses the default value. > We can control the timing to start vertexes the accuracy of estimated input > size if we can tweak these ones. This is useful when a vertex has tasks that > process a different amount of data. > > We can reproduce the issue with this query. > {code:java} > SET hive.tez.auto.reducer.parallelism=true; > SET hive.tez.min.partition.factor=1.0; -- enforce auto-parallelism > SET tez.shuffle-vertex-manager.min-src-fraction=0.55; > SET tez.shuffle-vertex-manager.max-src-fraction=0.95; > CREATE TABLE mofu (name string); > INSERT INTO mofu (name) VALUES ('12345'); > SELECT name, count(*) FROM mofu GROUP BY name;{code} > The fractions are ignored. > {code:java} > 2020-12-04 11:41:42,484 [INFO] [Dispatcher thread {Central}] > |vertexmanager.ShuffleVertexManagerBase|: Settings minFrac: 0.25 maxFrac: > 0.75 auto: true desiredTaskIput: 256000000 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)