[ https://issues.apache.org/jira/browse/FLINK-36939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17943940#comment-17943940 ]
Keith Lee commented on FLINK-36939: ----------------------------------- Refactored the changes for https://issues.apache.org/jira/browse/FLINK-36947 , making changes to KinesisShardSplitReaderBase so that both the issue here with high CPU utilisation when on EFO mode and GetRecords throttling when on Polling are addressed See PR: https://github.com/apache/flink-connector-aws/pull/195 > High CPU Utilization with Flink Kinesis EFO Consumer > ---------------------------------------------------- > > Key: FLINK-36939 > URL: https://issues.apache.org/jira/browse/FLINK-36939 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kinesis > Affects Versions: 1.20.0, aws-connector-5.0.0 > Reporter: Keith Lee > Priority: Major > Attachments: Main.kt, Screenshot 1734584639640.png, Screenshot > 1734584781285.png, image-2025-01-10-12-43-29-262.png, > image-2025-01-10-12-44-48-869.png, image-2025-01-10-12-51-04-104.png, > image-2025-01-10-12-51-36-141.png, image.png > > > Observation: When EFO is enabled, the CPU usage spikes and stays elevated, > regardless of record volume. If we switch back to the standard polling > consumer (disabling EFO), CPU utilization returns to normal levels. > Profiling Results: Local profiling and flamegraphs suggest the connector may > be engaged in a busy-wait loop, continuously parking and un-parking threads > even when no data is available. This behavior consumes CPU cycles > unnecessarily. > Performance Impact: While the job still processes records correctly when they > arrive, the high baseline CPU consumption is concerning. It wastes resources > and triggers unnecessary scaling, which doesn’t resolve the issue since new > instances also experience the same CPU pattern. -- This message was sent by Atlassian Jira (v8.20.10#820010)