Hello, originally I reported this on the bug tracker, but was asked to first post this topic to this mailing list. I was told that afterwards the bug report will be created.
The internal method `java.lang.ConditionalSpecialCasing#lookUpTable` is used for special case conversion rules, and is called when either the specified locale has special casing rules (e.g. Turkish) or the string to convert contains characters with special casing rules, for example U+0130 (Latin Capital Letter I with Dot Above). The problem with this method is that it creates temporary objects. Given that the method is in the worst case called for every character (possibly even twice per character), this can cause a lot of temporary memory allocation for large strings. Below is the original bug report description (slightly modified), with a proposal how it can (at least in parts) be implemented without allocating any temporary objects; feedback is appreciated. I am not a JDK member and therefore cannot submit a pull request for this. Kind regards -------------------------- There are two issues with the method `lookUpTable` of the internal class java.lang.ConditionalSpecialCasing which is used for special case conversion: - It uses the int codepoint as key for a Map<Integer, ...> to look up the case conversion; therefore this wraps the int as an Integer - The special case conversion entries are stored in a HashSet<Entry> - First of all usage of a Set seems redundant because Entry does not even override `equals` and it look like always distinct Entry instances are added to the Set - Usage of a Set means a new Iterator object is created whenever case conversion entries are found for a code point It looks like both of this can be fixed, for example in the following way: 1. Remove ConditionalSpecialCasing.Entry.ch (and the corresponding getter) 2. Remove the static field ConditionalSpecialCasing.entry 3. For every existing entry add a static final field `entry<codepoint>` storing a Entry[] (<codepoint> being a placeholder for the codepoint hex string) 4. In ConditionalSpecialCasing.lookUpTable use a `switch` to access the corresponding `entry...` Here is a short example snippet showing that: ``` private static final Entry[] entry0069 = { new Entry(new char[]{0x0069}, new char[]{0x0130}, "tr", 0), // # LATIN SMALL LETTER I new Entry(new char[]{0x0069}, new char[]{0x0130}, "az", 0) // # LATIN SMALL LETTER I }; ... private static char[] lookUpTable(String src, int index, Locale locale, boolean bLowerCasing) { Entry[] entries = switch (src.codePointAt(index)) { case 0x0069 -> entry0069; ... default -> null; }; char[] ret = null; if (entries != null) { String currentLang = locale.getLanguage(); for (Entry entry : entries) { String conditionLang = entry.getLanguage(); ... } } return ret; } ``` Note: `java.lang.ConditionalSpecialCasing.isFinalCased` is also quite problematic because it creates a new StringCharacterIterator and a RuleBasedBreakIterator for each call. Unfortunately I don't know of an easy way how this can be avoided; it would be great if you could investigate solving this nonetheless, in the worst case with ThreadLocal or simiar. STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : Profile the object allocations of the `toLowerCase` calls of the following code snippets, for example with VisualVM: 1. Snippet: ``` String s = "\u0130".repeat(1000); s.toLowerCase(Locale.ROOT); ``` 2. Snippet: ``` String s = "\u03A3".repeat(1000); s.toLowerCase(Locale.ROOT); ``` ACTUAL - 1. Snippet: 2000 Integer objects created 2000 HashMap$KeyIterator objects created 2. Snippet: 1000 Integer objects created 1000 HashMap$KeyIterator objects created 1000 StringCharacterIterator objects created 1000 RuleBasedBreakIterator objects created