Re: When using the batch api, the sink task is always in the created state.

Caizhi Weng Mon, 06 Sep 2021 23:53:18 -0700

My previous mail intends to answer what is needed for all subtasks in a
batch job to run simultaneously. To just run a batch job the number of task
slots can be as small as 1. In this case each parallelism of each subtask
will run one by one.


Also the scheduling of the subtasks depends on the shuffling mode
(table.exec.shuffle-mode). By default all network shuffles in batch jobs
are blocking, which means that the downstream subtasks will only start
running after the upstream subtasks finish. To run all subtasks
simultaneously you should set that to "pipelined" (Flink <= 1.11) or
"ALL_EDGES_PIPELINED" (Flink >= 1.12).

Caizhi Weng <tsreape...@gmail.com> 于2021年9月7日周二 下午2:47写道：

> Hi!
>
> If you mean batch SQL then you'll need to prepare enough task slots for
> all subtasks. The number of task slots needed is the sum of parallelism of
> all subtasks as there is no slot reusing in batch jobs.
>
> lec ssmi <shicheng31...@gmail.com> 于2021年9月7日周二 下午2:13写道：
>
>> And My flink version is 1.11.0
>>
>> lec ssmi <shicheng31...@gmail.com> 于2021年9月7日周二 下午2:11写道：
>>
>>> Hi:
>>>    I'm not familar with batch api .And I write a program  just like
>>> "insert  into tab_a select  * from tab_b".
>>>    From the picture, there are only two tasks, one is the source task
>>> which is in RUNNING state. And the other one is sink task which is
>>> always in CREATE state.
>>>    According to logs, I found that  source task is reading the file I
>>> specified now, in other words, it is working normally.
>>>    Doesn't flink work after all operators are initialized?
>>>
>>>
>>> [image: image.png]
>>>
>>

Re: When using the batch api, the sink task is always in the created state.

Reply via email to