Thanks David. But I don’t see any solutions provided for the same.
On Jan 21, 2020, at 7:13 PM, David Magalhães <speeddra...@gmail.com<mailto:speeddra...@gmail.com>> wrote: I've found this ( https://stackoverflow.com/questions/50580756/flink-window-dragged-stream-performance ) post on StackOverflow, where someone complains about performance drop in KeyBy. On Tue, Jan 21, 2020 at 1:24 PM Dharani Sudharsan <dharani.sudhar...@outlook.in<mailto:dharani.sudhar...@outlook.in>> wrote: Hi All, Currently, I’m running a flink streaming application, the configuration below. Task slots: 45 Task Managers: 3 Job Manager: 1 Cpu : 20 per machine My sample code below: Process Stream: datastream.flatmap().map().process().addsink Data size: 330GB approx. Raw Stream: datastream.keyby.window.addsink When I run the raw stream, Kafka source is reading data in GB and it is able to read 330GB in 15m. But when I run the Process stream, there is a back pressure noticed and source is reading data in MBs and there is a huge impact on the performance. I’m using file state backend with checkpointing enabled. I tried debugging the issues. I made some changes to the code like below. Datastream.keyby.timewindow.reduce.flatmap.keyby.timewindow.reduce.map.keyby.process.addsink This time, the performance was slightly improved but not good and I noticed memory leaks which causing Task managers to go down and job is getting terminated. Any help would be much appreciated. Thanks, Dharani.