I've found this ( https://stackoverflow.com/questions/50580756/flink-window-dragged-stream-performance ) post on StackOverflow, where someone complains about performance drop in KeyBy.
On Tue, Jan 21, 2020 at 1:24 PM Dharani Sudharsan < dharani.sudhar...@outlook.in> wrote: > Hi All, > > Currently, I’m running a flink streaming application, the configuration > below. > > Task slots: 45 > Task Managers: 3 > Job Manager: 1 > Cpu : 20 per machine > > My sample code below: > > Process Stream: datastream.flatmap().map().process().addsink > > Data size: 330GB approx. > > Raw Stream: datastream.keyby.window.addsink > > When I run the raw stream, Kafka source is reading data in GB and it is > able to read 330GB in 15m. > > But when I run the Process stream, there is a back pressure noticed and > source is reading data in MBs and there is a huge impact on the > performance. > > I’m using file state backend with checkpointing enabled. > > I tried debugging the issues. I made some changes to the code like below. > > > Datastream.keyby.timewindow.reduce.flatmap.keyby.timewindow.reduce.map.keyby.process.addsink > > This time, the performance was slightly improved but not good and I > noticed memory leaks which causing Task managers to go down and job is > getting terminated. > > > Any help would be much appreciated. > > Thanks, > Dharani. > > > > > >