ming li created FLINK-24815:
-------------------------------

             Summary: Reduce the cpu cost of calculating stateSize during state 
allocation
                 Key: FLINK-24815
                 URL: https://issues.apache.org/jira/browse/FLINK-24815
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Checkpointing
            Reporter: ming li


When the task failover, we will reassign the state for each subtask and create 
a new {{OperatorSubtaskState}} object. At this time, the {{stateSize}} field in 
the {{OperatorSubtaskState}} will be recalculated. When using incremental 
{{{}Checkpoint{}}}, this field needs to traverse all shared states and then 
accumulate the size of the state.

Taking a job with 2000 parallelism and 100 share state for each task as an 
example, it needs to traverse 2000 * 100 = 20w times. At this time, the cpu of 
the JM scheduling thread will be full.

I think we can try to provide a construction method with {{stateSize}} for 
{{OperatorSubtaskState}} or delay the calculation of {{{}stateSize{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to