The semantics for LAZY_FROM_SOURCE are that tasks are scheduled /when
there is data to be consumed/, i.e. one the first record was emitted by
the previous operator. As such back-pressure exists in batch just like
in streaming.
On 29.08.2018 11:39, Darshan Singh wrote:
Thanks,
My job is simple. I am using table Api
1. Read from hdfs
2. Deserialize json to pojo and convert to table.
3. Group by some columns.
4. Convert back to dataset and write back to hdfs.
In the WebUI I can see at least first 3 running concurrently which
sort of makes sense. From your answer I understood that flink will do
first number 1 once that is completed it will do map(or grouping as
well) and then grouping and finally the write. Thus, there should be 1
task running at 1 time. This doesnt seem right to me or I
misunderstood what you said.
So here if my group by is slow then I expect some sort of back
pressure on the deserialise part or maybe reading from hdfs itself?
Thanks
On Wed, Aug 29, 2018 at 11:03 AM Zhijiang(wangzhijiang999)
<wangzhijiang...@aliyun.com <mailto:wangzhijiang...@aliyun.com>> wrote:
The backpressure is caused when downstream and upstream are
running concurrently, and the downstream is slower than the upstream.
In stream job, the schedule mode will schedule both sides
concurrently, so the backpressure may exist.
As for batch job, the default schedule mode is LAZY_FROM_SOURCE I
remember, that means the downstream will be scheduled after
upstream finishes, so the slower downstream will not block
upstream running, then the backpressure may not exist in this case.
Best,
Zhijiang
------------------------------------------------------------------
发件人:Darshan Singh <darshan.m...@gmail.com
<mailto:darshan.m...@gmail.com>>
发送时间:2018年8月29日(星期三) 16:20
收件人:user <user@flink.apache.org
<mailto:user@flink.apache.org>>
主 题:Backpressure? for Batches
I faced the issue with back pressure in streams. I was
wondering if we could face the same with the batches as well.
In theory it should be possible. But in Web UI for
backpressure tab for batches I was seeing that it was just
showing the tasks status and no status like "OK" etc.
So I was wondering if backpressure is a thing for batches. If
yes, how do we reduce this especially if I am reading from hdfs.
Thanks