Hi Fabian, This happens to me even when the restore is immediate, so there's not much data in Kafka to catch up (5 minutes max)
Regards Kien On Jul 13, 2017, 23:40, at 23:40, Fabian Hueske <fhue...@gmail.com> wrote: >I would guess that this is quite usual because the job has to >"catch-up" >work. >For example, if you took a save point two days ago and restore the job >today, the input data of the last two days has been written to Kafka >(assuming Kafka as source) and needs to be processed. >The job will now read as fast as possible from Kafka to catch-up to the >presence. This means the data is much fast ingested (as fast as Kafka >can >read and ship it) than during regular processing (as fast as your >sources >produce). >The processing speed is bound by your Flink job which means there will >be >backpressure. > >Once the job caught-up, the backpressure should disappear. > >Best, Fabian > >2017-07-13 15:48 GMT+02:00 Kien Truong <duckientru...@gmail.com>: > >> Hi all, >> >> I have one job where back-pressure is significantly higher after >resuming >> from a save point. >> >> Because that job makes heavy use of stateful functions with >> RocksDBStateBackend , >> >> I'm suspecting that this is the cause of performance degradation. >> >> Does anyone encounter simillar issues or have any tips for debugging >? >> >> >> I'm using Flink 1.3.2 with YARN in detached mode. >> >> >> Regards, >> >> Kien >> >>