Re: flink job : TPS drops from 400 to 30 TPS

JING ZHANG Mon, 27 Sep 2021 00:49:20 -0700

Hi,
Since there is not enough information, you could first check the back
pressure status of the job [1], find the task which caused the back
pressure.
Then try to find out why the task processed data slowly, there are many
reasons, for example the following reasons:
(1) Does data skew exist, which means some tasks processed more input data
than the other tasks?
(2) Is the CPU cost very high?
(3) Does RPC response start to slow down？
(4) If you choose async mode lookup, the LookupJoin operator needs to
buffer some data into state. Which state backend do you use? Does the state
backend work fine?
...


Would you please provide more information about the job, for example back
pressure status, input data distribution, async mode or sync mode lookup.

[1]
https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/monitoring/back_pressure/

Best,
JING ZHANG

Ragini Manjaiah <ragini.manja...@gmail.com> 于2021年9月27日周一 下午2:05写道：

> Hi ,
> I have a flink real time job which  processes user records via topic and
> also reading data from hbase acting as a look table . If the look table
> does not contain required metadata then it queries the external db via api
> . First 1to 2 hours it works fine without issues, later it drops down
> drastically to 30 TPS. What are the things I need to look into in such a
> situation? There are no exceptions caught . how to check the bottle neck
> area . can some throw some light on this.
>
>
> Thanks & Regards
> Ragini Manjaiah
>
>

Re: flink job : TPS drops from 400 to 30 TPS

Reply via email to