I would like to report a performance regression when using StringSubstitutor with large strings that our application experienced after upgrading to v1.9+, however there’s no public signup for the ASF Jira anymore. I’m hoping my report here will suffice. If not, please create an account for me and I’ll file a proper ticket for it.
As of v1.9, StringSubstitutor no longer pre-converts the TextStringBuilder to a char[], see this ( https://github.com/apache/commons-text/commit/248af06171e14648e00ce0873c5f95e03041a6c7 ) commit, and opts to reuse the TextStringBuilder API instead. A new default method ( https://github.com/apache/commons-text/blob/master/src/main/java/org/apache/commons/text/matcher/StringMatcher.java#L146 ) was added to StringMatcher that takes CharSequence (which TextStringBuilder implements) to handle the conversion. However, it calls CharSequenceUtils.toCharArray(buffer), which is not aware of TextStringBuilder and cannot optimize the conversion to char[] since CharSequence has no way to do so, and it’s not a String (which does). When using a custom StringMatcher implementation that does not override this default method (as the stock matchers do), it results in a full copy of the CharSequence being made, which adds up very quickly when the text is large and lots of replacements are being made. Methods of ours which used to take 3 seconds, now take upwards of a minute. Fortunately, our custom matcher is a simple OrStringMatcher (not provided out-of-the-box) which delegates to stock StringMatchers created by StringMatcherFactory.stringMatcher(…) which have their own optimized implementation of the method, so we were able to resolve this ourselves by overriding the method and delegating it directly to the optimized implementation of the stock matchers – thus bypassing the CharSequenceUtils.toCharArray(buffer) penalty completely. But others may not be as fortunate. Perhaps the default method could be made aware of TextStringBuilder and use its package protected getBuffer() method instead? Or maybe there’s a better way to solve it. Hopefully, I’ve explained it clearly enough. Please reach out with any further questions. -- David Becker Senior IT Engineer ******************************************************************************************************************************************************************* Notice: This e-mail, including any attachment(s) and link(s), is confidential, proprietary and intended solely for the above-named individual(s). It may constitute non-public information and may contain information subject to certain legal privileges. If you are the intended recipient, your use of any confidential, proprietary or personal information may be restricted by federal and state privacy or other laws. Any unauthorized use of this communication by others is strictly prohibited and may be unlawful. If you have received this e-mail in error, do not open any attachment(s) or link(s). Please notify the sender immediately by replying to sender and then delete both this e-mail and any attachment(s). Thank you. EMPLOYERS® provides workers compensation insurance through Employers Preferred Insurance Company, Employers Assurance Company, Employers Compensation Insurance Company and Employers Insurance Company of Nevada. EIG Services, Inc. (in California, dba EIG Insurance Services) is an affiliated agency and adjuster. *******************************************************************************************************************************************************************
