TheR1sing3un opened a new pull request, #13350: URL: https://github.com/apache/hudi/pull/13350
I find that on the hot paths for reading and writing, some string-related operations waste a lot of our cpu time. For example, as shown in the following figure, on the path we write, operations related to `String#format` even occupy more than 20% of the time. <img width="532" alt="image" src="https://github.com/user-attachments/assets/692c2507-0678-4b93-aaa1-4a50819327d3" /> <img width="136" alt="image" src="https://github.com/user-attachments/assets/0785aa2b-0566-4190-86d2-bf7a0b403a43" /> Each record will go through a lot of character concatenation during the processing. Using `String#format` for concatenation will cause the jvm to need to parse the format and concatenate the corresponding string each time, which is a very performance-consuming operation. There are two ways to optimize it. One is to parse and compile the format in advance, and each time only the strings need to be concatenated. Another way is to use `StringBuilder`. After my micro benchmark, the former can improve performance by nearly 10 times and the latter by nearly a hundred times. Therefore, I used the latter for optimization. ### Change Logs 1. Optimizing the hot path using stringformat leads to performance loss _Describe context and summary for this change. Highlight if any code was copied._ ### Impact improve string related operations' performance ### Risk level (write none, low medium or high below) low ### Documentation Update none _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [x] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
