[ https://issues.apache.org/jira/browse/FLINK-31065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689066#comment-17689066 ]
luoyuxia commented on FLINK-31065: ---------------------------------- [~yunfanfight...@foxmail.com] Thanks for raising it up. I also noticed it before and then tried a static split assign in our internal TPC-DS benmark. Unfortunately, I haven't found any performance improvement, so I give up. But I do think it's a good point considering failover. Apart from introducing static split assign strategy, will you plan also to make some sources like Hive use it? > Support more split assigner strategy for batch job > -------------------------------------------------- > > Key: FLINK-31065 > URL: https://issues.apache.org/jira/browse/FLINK-31065 > Project: Flink > Issue Type: Improvement > Reporter: yunfan > Priority: Major > > Currently flink use LocatableInputSplitAssigner as the default split > assigner. > Which splits the task will consume are dynamic in the runtime. > It is not a good strategy in the batch mode. > For example, we have 100 splits and the job has 100 tasks. > When the job start, we don't have enough resource to start the 100 tasks, > only 10 tasks started in the first time.( it is a common case in batch mode) > These 10 tasks will consume all splits, and other 90 tasks will reach finish > state. > This is obviously not a good idea in batch mode. > In extreme cases, 99 tasks finished, only one task running, and if this > running task failed, > it will take much time to rerun this task( Because it need to consume it's 10 > split again). > Spark will bind splits to it's task, I think it is a better way in the batch > mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)