The semantics for LAZY_FROM_SOURCE are that tasks are scheduled /when there is data to be consumed/, i.e. one the first record was emitted by the previous operator. As such back-pressure exists in batch just like in streaming.

On 29.08.2018 11:39, Darshan Singh wrote:
Thanks,

My job is simple. I am using table Api
1. Read from hdfs
2. Deserialize json to pojo and convert to table.
3. Group by some columns.
4. Convert back to dataset and write back to hdfs.

In the WebUI I can see at least first 3 running concurrently which sort of makes sense. From your answer I understood that flink will do first number 1 once that is completed it will do map(or grouping as well) and then grouping and finally the write. Thus, there should be 1 task running at 1 time. This doesnt seem right to me or I misunderstood what you said.

So here if my group by is slow then I expect some sort of back pressure on the deserialise part or maybe reading from hdfs itself?

Thanks

On Wed, Aug 29, 2018 at 11:03 AM Zhijiang(wangzhijiang999) <wangzhijiang...@aliyun.com <mailto:wangzhijiang...@aliyun.com>> wrote:

    The backpressure is caused when downstream and upstream are
    running concurrently, and the downstream is slower than the upstream.
    In stream job, the schedule mode will schedule both sides
    concurrently, so the backpressure may exist.
    As for batch job, the default schedule mode is LAZY_FROM_SOURCE I
    remember, that means the downstream will be scheduled after
    upstream finishes, so the slower downstream will not block
    upstream running, then the backpressure may not exist in this case.

    Best,
    Zhijiang

        ------------------------------------------------------------------
        发件人:Darshan Singh <darshan.m...@gmail.com
        <mailto:darshan.m...@gmail.com>>
        发送时间:2018年8月29日(星期三) 16:20
        收件人:user <user@flink.apache.org
        <mailto:user@flink.apache.org>>
        主 题:Backpressure? for Batches

        I faced the issue with back pressure in streams. I was
        wondering if we could face the same with the batches as well.

        In theory it should be possible. But in Web UI for
        backpressure tab for batches I was seeing that it was just
        showing the tasks status and no status like "OK" etc.

        So I was wondering if backpressure is a thing for batches. If
        yes, how do we reduce this especially if I am reading from hdfs.

        Thanks



Reply via email to