Thanks,

My job is simple. I am using table Api
1. Read from hdfs
2. Deserialize json to pojo and convert to table.
3. Group by some columns.
4. Convert back to dataset and write back to hdfs.

In the WebUI I can see at least first 3 running concurrently which sort of
makes sense. From your answer I understood that flink will do first number
1 once that is completed it will do map(or grouping as well) and then
grouping and finally the write. Thus, there should be 1 task running at 1
time. This doesnt seem right to me or I misunderstood what you said.

So here if my group by is slow then I expect some sort of back pressure on
the deserialise part or maybe reading from hdfs itself?

Thanks

On Wed, Aug 29, 2018 at 11:03 AM Zhijiang(wangzhijiang999) <
wangzhijiang...@aliyun.com> wrote:

> The backpressure is caused when downstream and upstream are running
> concurrently, and the downstream is slower than the upstream.
> In stream job, the schedule mode will schedule both sides concurrently, so
> the backpressure may exist.
> As for batch job, the default schedule mode is LAZY_FROM_SOURCE I
> remember, that means the downstream will be scheduled after upstream
> finishes, so the slower downstream will not block upstream running, then
> the backpressure may not exist in this case.
>
> Best,
> Zhijiang
>
> ------------------------------------------------------------------
> 发件人:Darshan Singh <darshan.m...@gmail.com>
> 发送时间:2018年8月29日(星期三) 16:20
> 收件人:user <user@flink.apache.org>
> 主 题:Backpressure? for Batches
>
> I faced the issue with back pressure in streams. I was wondering if we
> could face the same with the batches as well.
>
> In theory it should be possible. But in Web UI for backpressure tab for
> batches I was seeing that it was just showing the tasks status and no
> status like "OK" etc.
>
> So I was wondering if backpressure is a thing for batches. If yes, how do
> we reduce this especially if I am reading from hdfs.
>
> Thanks
>
>
>

Reply via email to