Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6]

2025-04-10 Thread Magnus Ihse Bursie
On Thu, 10 Apr 2025 07:31:37 GMT, Magnus Ihse Bursie wrote: >> Right, that `å` looks to have been incorrectly converted during the >> ISO-8859-1 to UTF-8 conversion. (I can't find the script used for conversion >> as this change is from some time ago.) >> >> Since the change occurs in a commen

Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6]

2025-04-10 Thread Magnus Ihse Bursie
On Wed, 9 Apr 2025 21:26:15 GMT, Justin Lu wrote: >> src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/Encodings.properties >> line 22: >> >>> 20: # Peter Smolik >>> 21: Cp1250 WINDOWS-1250 0x00FF >>> 22: # Patch attributed to hava...@underdusken.no (H�vard Wigtil) >> >> Th

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Raffaello Giulietti
On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily > enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little > endian UTF-16 encoding

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Magnus Ihse Bursie
On Thu, 10 Apr 2025 11:46:45 GMT, Raffaello Giulietti wrote: > I guess the difference at L.1 in the various files is just the BOM? Yes. - PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discussion_r2037357899

RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Magnus Ihse Bursie
I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of

RFR: 8354273: Restore even more pointless unicode characters to ASCII

2025-04-10 Thread Magnus Ihse Bursie
As a follow-up to [JDK-8354213](https://bugs.openjdk.org/browse/JDK-8354213), I found some additional places where unicode characters are unnecessarily used instead of pure ASCII. - Commit messages: - 8354273: Restore even more pointless unicode characters to ASCII Changes: https:

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Magnus Ihse Bursie
On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily > enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little > endian UTF-16 encoding

Re: RFR: 8354273: Restore even more pointless unicode characters to ASCII [v2]

2025-04-10 Thread Magnus Ihse Bursie
> As a follow-up to [JDK-8354213](https://bugs.openjdk.org/browse/JDK-8354213), > I found some additional places where unicode characters are unnecessarily > used instead of pure ASCII. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revis

Re: RFR: 8354273: Restore even more pointless unicode characters to ASCII [v2]

2025-04-10 Thread Magnus Ihse Bursie
On Thu, 10 Apr 2025 10:36:31 GMT, Magnus Ihse Bursie wrote: >> As a follow-up to >> [JDK-8354213](https://bugs.openjdk.org/browse/JDK-8354213), I found some >> additional places where unicode characters are unnecessarily used instead of >> pure ASCII. > > Magnus Ihse Bursie has updated the pul

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Raffaello Giulietti
On Thu, 10 Apr 2025 10:14:40 GMT, Magnus Ihse Bursie wrote: >> I have checked the entire code base for incorrect encodings, but luckily >> enough these were the only remaining problems I found. >> >> BOM (byte-order mark) is a method used for distinguishing big and little >> endian UTF-16 enc

Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6]

2025-04-10 Thread Eirik Bjørsnøs
On Thu, 10 Apr 2025 07:32:18 GMT, Magnus Ihse Bursie wrote: >> You don't have to do that, I'm working on an omnibus UTF-8 fixing PR right >> now, where I will include a fix for this as well. > > If anything, I might be a bit worried that there are more incorrect > conversions stemming from this

Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2025-04-10 Thread Magnus Ihse Bursie
On Wed, 13 Sep 2023 17:38:28 GMT, Justin Lu wrote: >> JDK .properties files still use ISO-8859-1 encoding with escape sequences. >> It would improve readability to see the native characters instead of escape >> sequences (especially for the L10n process). The majority of files changed >> are l

Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2025-04-10 Thread Eirik Bjørsnøs
On Wed, 13 Sep 2023 17:38:28 GMT, Justin Lu wrote: >> JDK .properties files still use ISO-8859-1 encoding with escape sequences. >> It would improve readability to see the native characters instead of escape >> sequences (especially for the L10n process). The majority of files changed >> are l

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Naoto Sato
On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily > enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little > endian UTF-16 encoding

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Erik Joelsson
On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily > enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little > endian UTF-16 encoding

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Raffaello Giulietti
On Thu, 10 Apr 2025 17:09:27 GMT, Naoto Sato wrote: >> I have checked the entire code base for incorrect encodings, but luckily >> enough these were the only remaining problems I found. >> >> BOM (byte-order mark) is a method used for distinguishing big and little >> endian UTF-16 encodings.

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Naoto Sato
On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily > enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little > endian UTF-16 encoding

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Eirik Bjørsnøs
On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily > enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little > endian UTF-16 encoding

Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2025-04-10 Thread Magnus Ihse Bursie
On Wed, 13 Sep 2023 17:38:28 GMT, Justin Lu wrote: >> JDK .properties files still use ISO-8859-1 encoding with escape sequences. >> It would improve readability to see the native characters instead of escape >> sequences (especially for the L10n process). The majority of files changed >> are l

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Eirik Bjørsnøs
On Thu, 10 Apr 2025 17:23:37 GMT, Raffaello Giulietti wrote: > If this is a French name, it's e acute: é. Supported by this Wikipedia page listing S.L as an LCMS developer: https://en.wikipedia.org/wiki/Little_CMS - PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discus

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Eirik Bjørsnøs
On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily > enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little > endian UTF-16 encoding

Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2025-04-10 Thread Justin Lu
On Thu, 10 Apr 2025 08:44:28 GMT, Eirik Bjørsnøs wrote: >> Justin Lu has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Replace InputStreamReader with BufferedReader > > FWIW, I checked out the revision of the commit previous to this change

RFR: 8354335: No longer deprecate wrapper class constructors for removal

2025-04-10 Thread Roger Riggs
Remove forRemoval = true from @Deprecated annotation of Boolean, Byte, Character, Double, Float, Integer, Long, Short. And add `SuppressWarnings("deprecation") `where needed; and remove `SuppressWarnings("removal")` - Commit messages: - 8354335: No longer deprecate wrapper class co

Re: RFR: 8354335: No longer deprecate wrapper class constructors for removal

2025-04-10 Thread Chen Liang
On Thu, 10 Apr 2025 22:05:04 GMT, Roger Riggs wrote: > Remove forRemoval = true from @Deprecated annotation of Boolean, Byte, > Character, Double, Float, Integer, Long, Short. > And add `SuppressWarnings("deprecation") `where needed; and remove > `SuppressWarnings("removal")` The wrapper class

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Sergey Bylokhov
On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily > enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little > endian UTF-16 encoding

Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6]

2025-04-10 Thread Magnus Ihse Bursie
On Thu, 11 May 2023 20:21:57 GMT, Justin Lu wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. >> (Excluding the Unicode space and tab sequence). The conversion was done >> using native2ascii. >> >> In addition, the build logic is adjusted to support reading in t

Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6]

2025-04-10 Thread Magnus Ihse Bursie
On Thu, 10 Apr 2025 08:08:02 GMT, Eirik Bjørsnøs wrote: >> If anything, I might be a bit worried that there are more incorrect >> conversions stemming from this PR, that my automated tools and manual >> scanning has not revealed. > > Some observations: > > 1: This PR seems to have been abondo

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Magnus Ihse Bursie
On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily > enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little > endian UTF-16 encoding

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Magnus Ihse Bursie
On Thu, 10 Apr 2025 19:06:35 GMT, Eirik Bjørsnøs wrote: > (BTW, I enjoyed seeing separate commits for the encoding and BOM changes, > makes it easier to verify each!) Thanks! I do very much like myself to review PRs that has separate logical commits, so I try to produce such myself. I'm glad t

Re: RFR: 8354266: Fix non-UTF-8 text encoding

2025-04-10 Thread Magnus Ihse Bursie
On Thu, 10 Apr 2025 18:30:22 GMT, Eirik Bjørsnøs wrote: >> If this is a French name, it's e acute: é. > >> If this is a French name, it's e acute: é. > > Supported by this Wikipedia page listing S.L as an LCMS developer: > > https://en.wikipedia.org/wiki/Little_CMS It's not a mistake in capita