退订
发自我的iPhone ------------------ Original ------------------ From: Abhishek Singla <abhisheksingla...@gmail.com> Date: Sun,Mar 26,2023 11:58 PM To: user <user@flink.apache.org> Subject: Re: Flink CEP Resource Utilisation Optimisation Hi Team, Flink Version: 1.15.0 Java Version: 1.8 Standalone Cluster Task Manager: AWS EC2 of Instance Type c5n.4xlarge (vCPU 16, Memory 42 Gb, 8 slots per TM) CEP Scenario: Kafka Event A followed by Kafka Event B within 10 mins Throughput: 20k events per second for Event A, 0 for Kafka Event B State Backend: FsStateBackend Unaligned Checkpoints: Enabled asynchronousSnapshots: true While testing this (Kafka Event A followed by Kafka Event B within 10 mins) scenario on load environment, it took 20 nodes of TM to achieve this throughput otherwise either CPU utilization would reach its peak or backpressure would be observed because output buffers are full. The checkpoint size is only 6.75 GB, the state stored within the CEP operator would be much lesser as we do unaligned checkpointing. I am looking for some input on if it takes this many resources to archive this throughput, and if not what probably could be the issue here. There was one more issue that I found If the throughput of Event A goes to zero, then also the checkpoint size stays around 2 GB even after hours. Is this expected? Regards, Abhishek Singla