Re: Backpressure? for Batches

Chesnay Schepler Wed, 29 Aug 2018 03:11:53 -0700

The semantics for LAZY_FROM_SOURCE are that tasks are scheduled /whenthere is data to be consumed/, i.e. one the first record was emitted bythe previous operator. As such back-pressure exists in batch just likein streaming.


On 29.08.2018 11:39, Darshan Singh wrote:

Thanks,


My job is simple. I am using table Api
1. Read from hdfs
2. Deserialize json to pojo and convert to table.
3. Group by some columns.
4. Convert back to dataset and write back to hdfs.

In the WebUI I can see at least first 3 running concurrently whichsort of makes sense. From your answer I understood that flink will dofirst number 1 once that is completed it will do map(or grouping aswell) and then grouping and finally the write. Thus, there should be 1task running at 1 time. This doesnt seem right to me or Imisunderstood what you said.

So here if my group by is slow then I expect some sort of backpressure on the deserialise part or maybe reading from hdfs itself?


Thanks

On Wed, Aug 29, 2018 at 11:03 AM Zhijiang(wangzhijiang999)<wangzhijiang...@aliyun.com <mailto:wangzhijiang...@aliyun.com>> wrote:


    The backpressure is caused when downstream and upstream are
    running concurrently, and the downstream is slower than the upstream.
    In stream job, the schedule mode will schedule both sides
    concurrently, so the backpressure may exist.
    As for batch job, the default schedule mode is LAZY_FROM_SOURCE I
    remember, that means the downstream will be scheduled after
    upstream finishes, so the slower downstream will not block
    upstream running, then the backpressure may not exist in this case.

    Best,
    Zhijiang

        ------------------------------------------------------------------
        发件人：Darshan Singh <darshan.m...@gmail.com
        <mailto:darshan.m...@gmail.com>>
        发送时间：2018年8月29日(星期三) 16:20
        收件人：user <user@flink.apache.org
        <mailto:user@flink.apache.org>>
        主　题：Backpressure? for Batches

        I faced the issue with back pressure in streams. I was
        wondering if we could face the same with the batches as well.

        In theory it should be possible. But in Web UI for
        backpressure tab for batches I was seeing that it was just
        showing the tasks status and no status like "OK" etc.

        So I was wondering if backpressure is a thing for batches. If
        yes, how do we reduce this especially if I am reading from hdfs.

        Thanks

Re: Backpressure? for Batches

Reply via email to