Hi, Which state backend and Flink version are you using? There was a problem with large merging states on RocksDB, caused by some inefficiencies in the merge operator of RocksDB. We provide a custom patch for this with all newer versions of Flink.
Best, Stefan > Am 23.05.2017 um 21:24 schrieb Chen Qin <qinnc...@gmail.com>: > > Hi there, > > I have seen some weird perf issue while running event time based job with > large sliding window (24 hours offset every 10s) > > pipeline looks simple, > tail kafka topic and assign timestamp and watermark, forward to large sliding > window (30days) and fire every 10 seconds and print out. > > what I have seen first hand was checkpointing stuck, took longer than timeout > despite traffic volume is low ~300 TPS. Looking deeper, it seems back > pressure kick in and window operator consumes message really slowly and > throttle sources. > > I also tried to limit window time to mins and all issues are gone. > > Any suggestion on this. My work around is I implemented processFunction and > keep big value state, periodically evaluate and emit downstream (emulate what > sliding window does) > > Thanks, > Chen > >