ming li created FLINK-24815: ------------------------------- Summary: Reduce the cpu cost of calculating stateSize during state allocation Key: FLINK-24815 URL: https://issues.apache.org/jira/browse/FLINK-24815 Project: Flink Issue Type: Improvement Components: Runtime / Checkpointing Reporter: ming li
When the task failover, we will reassign the state for each subtask and create a new {{OperatorSubtaskState}} object. At this time, the {{stateSize}} field in the {{OperatorSubtaskState}} will be recalculated. When using incremental {{{}Checkpoint{}}}, this field needs to traverse all shared states and then accumulate the size of the state. Taking a job with 2000 parallelism and 100 share state for each task as an example, it needs to traverse 2000 * 100 = 20w times. At this time, the cpu of the JM scheduling thread will be full. I think we can try to provide a construction method with {{stateSize}} for {{OperatorSubtaskState}} or delay the calculation of {{{}stateSize{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001)