On Wed, 13 Sep 2023 17:38:28 GMT, Justin Lu <j...@openjdk.org> wrote:

>> JDK .properties files still use ISO-8859-1 encoding with escape sequences. 
>> It would improve readability to see the native characters instead of escape 
>> sequences (especially for the L10n process). The majority of files changed 
>> are localized resource files.
>> 
>> This change converts the Unicode escape sequences in the JDK .properties 
>> files (both in src and test) to UTF-8 native characters. Additionally, the 
>> build logic is adjusted to read the .properties files in UTF-8 while 
>> generating the ListResourceBundle files.
>> 
>> The only escape sequence not converted was `\u0020` as this is used to 
>> denote intentional trailing white space. (E.g. `key=This is the 
>> value:\u0020`)
>> 
>> The conversion was done using native2ascii with options `-reverse -encoding 
>> UTF-8`.
>> 
>> If this PR is integrated, the IDE default encoding for .properties files 
>> need to be updated to UTF-8. (IntelliJ IDEA locks .properties files as 
>> ISO-8859-1 unless manually changed).
>
> Justin Lu has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Replace InputStreamReader with BufferedReader

Continuing the discussion that was started at a predecessor to this PR, 
https://github.com/openjdk/jdk/pull/12726#discussion_r2035582242. At least one 
incorrect conversion has been found in this PR. It might be worthwhile to 
double- and triple-check all the other conversions as well.

As part of https://bugs.openjdk.org/browse/JDK-8301971 I am trying various ways 
of detecting files without UTF-8 encoding, but it is still a bit of hit and 
miss, since there are no surefire way of telling which encoding a file has, 
only heuristics. So finding and following up potential sources of error is 
important.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15694#issuecomment-2791991649
PR Comment: https://git.openjdk.org/jdk/pull/15694#issuecomment-2791997157

Reply via email to