[PR] [INLONG-7056][Sort]Adjust sort resources according to data scale [inlong]

via GitHub Mon, 26 Aug 2024 22:16:52 -0700


PeterZh6 opened a new pull request, #10915:
URL: https://github.com/apache/inlong/pull/10915


   Fixes #7056 
   
   ### Motivation
   
   Currently, the total amount of resources for the Flink Sort Job comes from 
the configuration file `flink-sort-plugin.properties`, meaning that all 
submitted sort jobs will use the same amount of resources. When the data scale 
is large, the resources may be insufficient, and when the data scale is small, 
the resources may be wasted. Therefore, dynamically adjusting the number of 
resources according to the amount of data is a critically needed function.
   
   ### Modifications
   
   Before submitting a job to Flink with 
`org.apache.inlong.manager.plugin.flink.FlinkService#submitJobBySavepoint`, the 
`org.apache.inlong.manager.plugin.flink.FlinkParallelismOptimizer` will first 
query the average data volume from the past hour and adjust the parallelism 
based on this data volume. 
   Meanwhile, this function can be swiched on or off and maxmimum message for 
one core can be configured in `flink-sort-plugin.properties`
   
   ### Verifying this change
   
   - [ ] This change is a trivial rework/code cleanup without any test coverage.
   
   - [x] This change is already covered by existing tests, such as:
   When creating a stream in Data Ingestion, you can try to make the source 
data constantly increase and reach a significant amount (approximately more 
than 2000 per hour). Then, resubmit the job. You should notice that the 
parallelism of the Flink job corresponding to the stream will be larger than 
the default value of 1. This change will also be reflected in the manager logs.
   
   - [] This change added tests and can be verified as follows:
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@inlong.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] [INLONG-7056][Sort]Adjust sort resources according to data scale [inlong]

Reply via email to