On Wed, 13 Sep 2023 17:38:28 GMT, Justin Lu <j...@openjdk.org> wrote:
>> JDK .properties files still use ISO-8859-1 encoding with escape sequences. >> It would improve readability to see the native characters instead of escape >> sequences (especially for the L10n process). The majority of files changed >> are localized resource files. >> >> This change converts the Unicode escape sequences in the JDK .properties >> files (both in src and test) to UTF-8 native characters. Additionally, the >> build logic is adjusted to read the .properties files in UTF-8 while >> generating the ListResourceBundle files. >> >> The only escape sequence not converted was `\u0020` as this is used to >> denote intentional trailing white space. (E.g. `key=This is the >> value:\u0020`) >> >> The conversion was done using native2ascii with options `-reverse -encoding >> UTF-8`. >> >> If this PR is integrated, the IDE default encoding for .properties files >> need to be updated to UTF-8. (IntelliJ IDEA locks .properties files as >> ISO-8859-1 unless manually changed). > > Justin Lu has updated the pull request incrementally with one additional > commit since the last revision: > > Replace InputStreamReader with BufferedReader Continuing the discussion that was started at a predecessor to this PR, https://github.com/openjdk/jdk/pull/12726#discussion_r2035582242. At least one incorrect conversion has been found in this PR. It might be worthwhile to double- and triple-check all the other conversions as well. As part of https://bugs.openjdk.org/browse/JDK-8301971 I am trying various ways of detecting files without UTF-8 encoding, but it is still a bit of hit and miss, since there are no surefire way of telling which encoding a file has, only heuristics. So finding and following up potential sources of error is important. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15694#issuecomment-2791991649 PR Comment: https://git.openjdk.org/jdk/pull/15694#issuecomment-2791997157