Hi Dan, Would you please describe what's the problem about your job? High latency or low throughput? Please first check the job throughput and latency . If the job throughput matches the speed of sources producing data and the latency metric is good, maybe the job works well without bottlenecks. If you find unnormal throughput or latency, please try the following points: 1. check the back pressure 2. check whether checkpoint duration is long and whether the checkpoint size is expected
Please share the details for deeper analysis in this email if you find something abnormal about the job. Best, JING ZHANG Dan Hill <quietgol...@gmail.com> 于2021年6月17日周四 下午12:44写道: > We have a job that has been running but none of the AWS resource metrics > for the EKS, EC2, MSK and EBS show any bottlenecks. I have multiple 8 > cores allocated but only ~2 cores are used. Most of the memory is not > consumed. MSK does not show much use. EBS metrics look mostly idle. I > assumed I'd be able to see whichever resources is a bottleneck. > > Is there a good way to diagnose where the bottleneck is for a Flink job? >