Hi, Since there is not enough information, you could first check the back pressure status of the job [1], find the task which caused the back pressure. Then try to find out why the task processed data slowly, there are many reasons, for example the following reasons: (1) Does data skew exist, which means some tasks processed more input data than the other tasks? (2) Is the CPU cost very high? (3) Does RPC response start to slow down? (4) If you choose async mode lookup, the LookupJoin operator needs to buffer some data into state. Which state backend do you use? Does the state backend work fine? ...
Would you please provide more information about the job, for example back pressure status, input data distribution, async mode or sync mode lookup. [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/monitoring/back_pressure/ Best, JING ZHANG Ragini Manjaiah <ragini.manja...@gmail.com> 于2021年9月27日周一 下午2:05写道: > Hi , > I have a flink real time job which processes user records via topic and > also reading data from hbase acting as a look table . If the look table > does not contain required metadata then it queries the external db via api > . First 1to 2 hours it works fine without issues, later it drops down > drastically to 30 TPS. What are the things I need to look into in such a > situation? There are no exceptions caught . how to check the bottle neck > area . can some throw some light on this. > > > Thanks & Regards > Ragini Manjaiah > >