Thanks for your reply, I've opened a PR[1] where I hope to continue discussing this issue.
Glavo [1] https://github.com/openjdk/jdk/pull/13434 On Tue, Apr 11, 2023 at 9:02 PM Quân Anh Mai <anh...@gmail.com> wrote: > Hi, > > To propose deprecation of String::toLowerCase() and String::toUpperCase(), > you can create a patch as normal, with an addition of a CSR ticket that > describes the situation and the proposed solution. After that, you can ask > for someone from core-libs to review the ticket. The change can be merged > after sufficient reviews and the CSR being approved. You can view > other CSRs in JBS to see the overall structure, as well as other > deprecations in the JDK to see a typical deprecation description. More > details regarding CSRs can be found in the OpenJDK Wiki > <https://wiki.openjdk.org/display/csr/Main>. > > Hope this helps, > Quan Anh > > On Mon, 10 Apr 2023 at 00:47, Glavo <zjx001...@gmail.com> wrote: > >> Hi, >> >> We discussed this issue on this mailing list[1] earlier this year. >> >> I investigated the usage of these two methods and found that all use >> cases within >> JDK are suspicious, resulting in many imperceptible bugs. >> >> I hope to create a PR for this issue, deprecate these two methods, and >> create >> alternative methods for them. But I don't have the experience of making >> such >> changes, maybe I need some guidance or have more experienced people do >> these things. >> >> Glavo >> >> [1] >> https://mail.openjdk.org/pipermail/core-libs-dev/2023-January/099375.html >> >> On Sun, Apr 9, 2023 at 10:58 PM < >> some-java-user-99206970363698485...@vodafonemail.de> wrote: >> >>> Hello, >>> could you please add String & Character ASCII case conversion methods, >>> that is, methods which only perform case conversion on ASCII characters in >>> the input and leave any other characters unchanged. The conversion should >>> not depend on the default locale. For example: >>> - String: >>> - toAsciiLowerCase >>> - toAsciiUpperCase >>> - equalsAsciiIgnoreCase (or a better name) >>> - compareToAsciiIgnoreCase (or a better name) >>> - Character: >>> - toAsciiLowerCase >>> - toAsciiUpperCase >>> >>> This would give the following advantages: >>> - Increased performance (+ not be vulnerable to denial of service >>> attacks) >>> - Reduced number of bugs in applications >>> >>> >>> Please read on for a detailed explanation. >>> >>> I assume for historic reasons (Applets) the current case conversion >>> methods use the Unicode conversion rules, and even worse >>> String.toLowerCase() and String.toUpperCase() use the default locale. While >>> this might back then have been a reasonable choice because Applets ran >>> locally in the browser and localization was a nice to have feature (or even >>> a requirement), nowadays Java is largely used for back-end systems and case >>> conversion is pretty often done for technical strings and not display text >>> anymore. In this context applications mostly process ASCII strings. >>> However, because Java does not offer any specific case conversion >>> methods for these cases, users still use the standard String & Character >>> methods. This causes the following problems [1]: >>> >>> - String.toLowerCase() & String.toUpperCase() using default locale >>> What this means is that depending on the OS locale your application >>> might behave differently or fail [2]. For the scale of this, simply look in >>> the OpenJDK database: https://bugs.openjdk.org/issues/?jql=text ~ >>> "turkish locale" >>> At this point you probably have to add a disclaimer to any Java >>> program that running it on systems with Turkish (and possibly others) as >>> locale is not supported, because either your own code or the libraries you >>> are using might be calling toLowerCase() or toUpperCase() [3]. >>> >>> - Bad performance for Unicode aware case conversions >>> Compared to simply performing ASCII case conversion, applying Unicode >>> case conversion has worse performance. In some cases it can even have >>> extremely bad performance (JDK-8292573). This could have security >>> implications. >>> >>> - Bugs due to case conversion changing string length >>> Unicode case conversion for certain strings can change the length, >>> either increasing or decreasing the size of the string (or when combining >>> both, shifting position of characters in the string while keeping the >>> length the same). If an application assumes that the length of the string >>> remains the same and uses data derived from the original string (e.g. >>> character indices or length) on the converted string this can lead to >>> exceptions or potentially even security issues. >>> >>> - Unicode characters mapping to ASCII chars >>> When performing case conversion on certain non-ASCII Unicode >>> characters, the results are ASCII characters. For example >>> `Character.toLowerCase('\u212A') == 'k'`. This could have security >>> implications. >>> >>> - Update to Unicode data changing application behavior >>> Unicode evolves over time, and the JDK regularly updates the Unicode >>> data it is using. Even if an application is not affected by the issues >>> mentioned above, it could become affected by them when the Unicode data is >>> updated in a newer JDK version. >>> >>> My main point here is that (I assume) in many cases Java applications >>> don't need Unicode case conversion, let alone Unicode case conversion using >>> the default locale. If Java offered ASCII-only case conversion methods, >>> then hopefully users would (where applicable) switch to these methods over >>> time and avoid all the issues mentioned above. And even if they >>> accidentally use the ASCII-only methods for display text, the result might >>> be a minor inconvenience for users seeing the display text, compared to in >>> the other cases application bugs and security vulnerabilities. >>> >>> Related information about other programming languages: >>> - Rust: Has dedicated methods for ASCII case conversion, e.g. >>> https://doc.rust-lang.org/std/string/struct.String.html#method.to_ascii_lowercase >>> - Kotlin: Functions which implicitly use the default locale were >>> deprecated, see https://youtrack.jetbrains.com/issue/KT-43023 >>> >>> Risks: >>> - ASCII case conversion could lead to undesired results in some cases, >>> see the example for the word "café" on >>> https://doc.rust-lang.org/std/ascii/trait.AsciiExt.html (though that >>> specific example is about a display string, for which these ASCII-only >>> methods are not intended) >>> - When applications start to mix ASCII-only and the existing Unicode >>> conversion methods this could lead to bugs and security issues as well; >>> though it might also indicate a flaw in the application if it performs case >>> conversion on the same value in different places >>> >>> I hope you consider this suggestion. Feedback is appreciated! >>> >>> Kind regards >>> >>> ---- >>> >>> [1] I am not saying though that Java is the only affected language, it >>> definitely affects others as well. But that should not prevent improving >>> the Java API. >>> [2] Tool for detecting usage of such methods: >>> https://github.com/policeman-tools/forbidden-apis >>> [3] Maybe it would also be worth discussing deprecating >>> String.toLowerCase() and String.toUpperCase() because they seem to do more >>> harm than good. >>> >>> >>>