Re: BackPressure handling

Timo Walther Tue, 02 Jan 2018 07:35:52 -0800

Hi Vishal,

your assumptions sound reasonable to me. The community is currentlyworking on a more fine-grained back pressuring with credit-based flowcontrol. It is on the roamap for 1.5 [1]/[2]. I will loop in Nico thatmight tell you more about the details. Until then I guess you have toimplement a custom source/adapt an existing source to let the data flowin more realistic.


Regards,
Timo

[1]http://flink.apache.org/news/2017/11/22/release-1.4-and-1.5-timeline.html

[2] https://www.youtube.com/watch?v=scStdhz9FHc


Am 1/2/18 um 4:02 PM schrieb Vishal Santoshi:

I did a simulation on session windows ( in 2 modes ) and let it ripfor about 12 hours
1. Replay where a kafka topic with retention of 7 days was the source( earliest )
2. Start the pipe with kafka source ( latest )

I saw results that differed dramatically.
On replay the pipeline stalled after good ramp up while in the secondcase the pipeline hummed on without issues. For the same time periodthe data consumed is significantly more in the second case with the WMprogression stalled in the first case with no hint of resolution ( theincoming data on source topic far outstrips the WM progression ) Ithink I know the reasons and this is my hypothesis.
In replay mode the number of windows open do not have an upper bound.While buffer exhaustion ( and data in flight with watermark ) is thereason for throttle, it does not really limit the open windows and infact creates windows that reflect futuristic data ( future is relativeto the current WM ) . So if partition x has data for watermark timet(x) and partition y for watermark time t(y) and t(x) << t(y) wherethe overall watermark is t(x) nothing significantly throttlesconsumption from the y partition ( in fact for x too ) , the boundedbuffer based approach does not give minute control AFAIK as one wouldhope and that implies there are far more open windows than the systemcan handle and that leads to the pathological case where the buffersfill up ( I believe that happens way late ) and throttling occurs butthe WM does not proceed and windows that could ease the glut thethrottling cannot proceed..... In the replay mode the amount of dataimplies that the Fetchers keep pulling data at the maximum consumptionallowed by the open ended buffer approach.
My question thus is, is there any way to have a finer control of backpressure, where in the consumption from a source is throttledpreemptively ( by for example decreasing the buffers associated for apipe or the size allocated ) or sleeps in the Fetcher code that canhelp aligning the performance to have real time consumption characteristics
Regards,

Vishal.

Re: BackPressure handling

Reply via email to