Thanks Fabian, that quite explains what's going on. 2016-12-19 12:19 GMT+03:00 Fabian Hueske <fhue...@gmail.com>:
> Hi Yury, > > Flink's operators start processing as soon as they receive data. If an > operator produces more data than its successor task can process, the data > is buffered in Flink's network stack, i.e., its network buffers. > The backpressure mechanism kicks in when all network buffers are in use > and no more data can be buffered. In this case, a producing task will block > until a network buffer becomes available. > > If the window operator in your job aggregates the data, only the > aggregates will be buffered. > This might explain why the first operators of job are able to start > processing while the FlatMap operator is still setting up itself. > > Best, > Fabian > > 2016-12-17 13:42 GMT+01:00 Yury Ruchin <yuri.ruc...@gmail.com>: > >> Hi all, >> >> I have a streaming job that essentially looks like this: KafkaSource -> >> Map -> EventTimeWindow -> RichFlatMap -> CustomSink. The RichFlatMap part >> does some heavy lifting in open(), so that the open() call blocks for >> several minutes. I assumed that until open() returns the backpressure >> mechanism would slow down the entire upstream up to the KafkaSource, so >> that no new records would be emitted to the pipeline until the RichFlatMap >> is ready. What I actually observe is that the KafkaSource, Map and >> EventTimeWindow continue processing - the in/out records, in/out MB >> counters keep increasing. The RichFlatMap and its downstream CustomSink >> have 0 as expected, until the RichFlatMap is actually done with open(). The >> backpressure monitor in Flink UI shows "OK" for all operators. >> >> Why doesn't backpressure mechanism work in this case? >> >> Thanks, >> Yury >> > >