Thanks for your reply! But doesn't flink use stream to perform batch calculations? As you said above, to some extent, it is same as real batch computing eg.spark .
Caizhi Weng <tsreape...@gmail.com> 于2021年9月7日周二 下午2:53写道: > My previous mail intends to answer what is needed for all subtasks in a > batch job to run simultaneously. To just run a batch job the number of task > slots can be as small as 1. In this case each parallelism of each subtask > will run one by one. > > Also the scheduling of the subtasks depends on the shuffling mode > (table.exec.shuffle-mode). By default all network shuffles in batch jobs > are blocking, which means that the downstream subtasks will only start > running after the upstream subtasks finish. To run all subtasks > simultaneously you should set that to "pipelined" (Flink <= 1.11) or > "ALL_EDGES_PIPELINED" (Flink >= 1.12). > > Caizhi Weng <tsreape...@gmail.com> 于2021年9月7日周二 下午2:47写道: > >> Hi! >> >> If you mean batch SQL then you'll need to prepare enough task slots for >> all subtasks. The number of task slots needed is the sum of parallelism of >> all subtasks as there is no slot reusing in batch jobs. >> >> lec ssmi <shicheng31...@gmail.com> 于2021年9月7日周二 下午2:13写道: >> >>> And My flink version is 1.11.0 >>> >>> lec ssmi <shicheng31...@gmail.com> 于2021年9月7日周二 下午2:11写道: >>> >>>> Hi: >>>> I'm not familar with batch api .And I write a program just like >>>> "insert into tab_a select * from tab_b". >>>> From the picture, there are only two tasks, one is the source task >>>> which is in RUNNING state. And the other one is sink task which is >>>> always in CREATE state. >>>> According to logs, I found that source task is reading the file I >>>> specified now, in other words, it is working normally. >>>> Doesn't flink work after all operators are initialized? >>>> >>>> >>>> [image: image.png] >>>> >>>