On Tue, 13 Jun 2023 17:56:57 GMT, Naoto Sato <na...@openjdk.org> wrote:

>> This is stemming from the PR: https://github.com/openjdk/jdk/pull/14211 
>> where aggressive GC can cause NPE in `BaseLocale$Key` class. I refactored 
>> the in-house cache with WeakHashMap, and removed the Key class as it is no 
>> longer needed (thus the original NPE will no longer be possible). Also with 
>> the new JMH test case, it gains some performance improvement:
>> 
>> (w/o fix)
>> 
>> Benchmark                       Mode  Cnt      Score     Error  Units
>> LocaleCache.testForLanguageTag  avgt   20   5781.275 ± 569.580  ns/op
>> LocaleCache.testLocaleOf        avgt   20  62564.079 ± 406.697  ns/op
>> 
>> (w/ fix)
>> Benchmark                       Mode  Cnt      Score     Error  Units
>> LocaleCache.testForLanguageTag  avgt   20   4801.175 ± 371.830  ns/op
>> LocaleCache.testLocaleOf        avgt   20  60394.652 ± 352.471  ns/op
>
> Naoto Sato has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Addressing review comments

src/java.base/share/classes/sun/util/locale/BaseLocale.java line 168:

> 166:         // BaseLocale as the key. The returned "normalized" instance
> 167:         // can subsequently be used by the Locale instance which
> 168:         // guarantees the locale components are properly cased/interned.

Is it truly important that the String values are interned, or is the desire 
simply that all refer to the same "canonical" String instance?
my thought is that managing the canonical String instances could avoid 
WeakReferences and synchronized map access and just return a new BaseLocale 
each time.




    public static BaseLocale getInstance(String language, String script,
                                         String region, String variant) {
        if (script == null) {
            script = "";
        } else {
            script = getCanonicalString(script, LocaleUtils::toTitleString);
        }

        if (region == null) {
            region = "";
        } else {
            region = getCanonicalString(region, LocaleUtils::toUpperString);
        }

        if (language == null) {
            language = "";
        } else {
            language = getCanonicalString(language, LocaleUtils::toLowerString);
        }
        
        if (variant == null) {
            variant = "";
        } else {
            variant = getCanonicalString(variant, Function.identity());
        }
        ...
        return new BaseLocale(language, script, region, variant);
    }



    private static final ConcurrentMap<String, String> CANONICALIZED_STRINGS = 
new ConcurrentHashMap<>();

    static {
        // prime the old iso codes
        CANONICALIZED_STRINGS.put("he", "he");
        CANONICALIZED_STRINGS.put("iw", "iw");
        CANONICALIZED_STRINGS.put("id", "id");
        CANONICALIZED_STRINGS.put("in", "in");
        CANONICALIZED_STRINGS.put("ji", "ji");
        CANONICALIZED_STRINGS.put("yi", "yi");
    }
    
    private static String getCanonicalString(String value, Function<String, 
String> conversion) {
        String existing = CANONICALIZED_STRINGS.get(value);
        if (existing != null) {
            return existing;
        }

        String converted = conversion.apply(value);
        return CANONICALIZED_STRINGS.merge(converted, converted, (k,v) -> v);
    }

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14404#discussion_r1232205527

Reply via email to