PeterZh6 opened a new pull request, #10915: URL: https://github.com/apache/inlong/pull/10915
Fixes #7056 ### Motivation Currently, the total amount of resources for the Flink Sort Job comes from the configuration file `flink-sort-plugin.properties`, meaning that all submitted sort jobs will use the same amount of resources. When the data scale is large, the resources may be insufficient, and when the data scale is small, the resources may be wasted. Therefore, dynamically adjusting the number of resources according to the amount of data is a critically needed function. ### Modifications Before submitting a job to Flink with `org.apache.inlong.manager.plugin.flink.FlinkService#submitJobBySavepoint`, the `org.apache.inlong.manager.plugin.flink.FlinkParallelismOptimizer` will first query the average data volume from the past hour and adjust the parallelism based on this data volume. Meanwhile, this function can be swiched on or off and maxmimum message for one core can be configured in `flink-sort-plugin.properties` ### Verifying this change - [ ] This change is a trivial rework/code cleanup without any test coverage. - [x] This change is already covered by existing tests, such as: When creating a stream in Data Ingestion, you can try to make the source data constantly increase and reach a significant amount (approximately more than 2000 per hour). Then, resubmit the job. You should notice that the parallelism of the Flink job corresponding to the stream will be larger than the default value of 1. This change will also be reflected in the manager logs. - [] This change added tests and can be verified as follows: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@inlong.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org