[ 
https://issues.apache.org/jira/browse/HIVE-24485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280069#comment-17280069
 ] 

okumin commented on HIVE-24485:
-------------------------------

[~gopalv] Thanks. I tried to move parameters and updated the PR.

I have one concern about which params we should provide with users to tweak 
slow-start behavior. I'd appreciate it if you would give me opinions.

Thanks.

> Make the slow-start behavior tunable
> ------------------------------------
>
>                 Key: HIVE-24485
>                 URL: https://issues.apache.org/jira/browse/HIVE-24485
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive, Tez
>    Affects Versions: 3.1.2, 4.0.0
>            Reporter: okumin
>            Assignee: okumin
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This ticket would enable users to configure the timing of slow-start with 
> `tez.shuffle-vertex-manager.min-src-fraction` and 
> `tez.shuffle-vertex-manager.max-src-fraction`.
> Hive on Tez currently doesn't honor these parameters and ShuffleVertexManager 
> always uses the default value.
> We can control the timing to start vertexes the accuracy of estimated input 
> size if we can tweak these ones. This is useful when a vertex has tasks that 
> process a different amount of data.
>  
> We can reproduce the issue with this query.
> {code:java}
> SET hive.tez.auto.reducer.parallelism=true;
> SET hive.tez.min.partition.factor=1.0; -- enforce auto-parallelism
> SET tez.shuffle-vertex-manager.min-src-fraction=0.55;
> SET tez.shuffle-vertex-manager.max-src-fraction=0.95;
> CREATE TABLE mofu (name string);
> INSERT INTO mofu (name) VALUES ('12345');
> SELECT name, count(*) FROM mofu GROUP BY name;{code}
> The fractions are ignored.
> {code:java}
> 2020-12-04 11:41:42,484 [INFO] [Dispatcher thread {Central}] 
> |vertexmanager.ShuffleVertexManagerBase|: Settings minFrac: 0.25 maxFrac: 
> 0.75 auto: true desiredTaskIput: 256000000
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to