[jira] [Commented] (FLINK-31065) Support more split assigner strategy for batch job

yunfan (Jira) Wed, 15 Feb 2023 20:09:04 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-31065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689497#comment-17689497
 ]


yunfan commented on FLINK-31065:
--------------------------------

[~luoyuxia] Yes, hive is the main usage of this strategy. And I don't find any 
other source that need this strategy.

The main purpose is reduce the cost of failover. 

> Support more split assigner strategy for batch job
> --------------------------------------------------
>
>                 Key: FLINK-31065
>                 URL: https://issues.apache.org/jira/browse/FLINK-31065
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: yunfan
>            Priority: Major
>
> Currently flink use LocatableInputSplitAssigner as the default split 
> assigner. 
> Which splits the task will consume are dynamic in the runtime.
> It is not a good strategy in the batch mode.
> For example, we have 100 splits and the job has 100 tasks. 
> When the job start, we don't have enough resource to start the 100 tasks, 
> only 10 tasks started in the first time.( it is a common case in batch mode)
> These 10 tasks will consume all splits, and other 90 tasks will reach finish 
> state.
> This is obviously not a good idea in batch mode. 
> In extreme cases, 99 tasks finished, only one task running, and if this 
> running task failed,
> it will take much time to rerun this task( Because it need to consume it's 10 
> split again).
> Spark will bind splits to it's task, I think it is a better way in the batch 
> mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31065) Support more split assigner strategy for batch job

Reply via email to