Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6]

Magnus Ihse Bursie Thu, 10 Apr 2025 01:39:22 -0700

On Thu, 10 Apr 2025 08:08:02 GMT, Eirik Bjørsnøs <eir...@openjdk.org> wrote:


>> If anything, I might be a bit worried that there are more incorrect 
>> conversions stemming from this PR, that my automated tools and manual 
>> scanning has not revealed.
>
> Some observations: 
> 
> 1: This PR seems to have been abondoned, so perhaps this discussion belongs 
> in #15694 ?
> 
> 2: The `å` (Unicode 'Latin small letter a with ring above' U+00E5) was 
> correctly encoded as 0xEF in ISO-8859-1 previous to this change.
> 
> 3: The conversion changed this `0xEF` to the three-byte sequence `ef bf bd`
> 
> 4: This is as-if the file was incorrctly decoded using UTF-8, then encoded 
> using UTF-8:
> 
> 
> byte[] origBytes = "å".getBytes(StandardCharsets.ISO_8859_1);
> String decoded = new String(origBytes, StandardCharsets.UTF_8);
> byte[] encoded = decoded.getBytes(StandardCharsets.UTF_8);
> String hex = HexFormat.of().formatHex(encoded);
> assertEquals("efbfbd", hex);
> ``` 
> 
> Like @magicus I'm worried that similar incorrect decoding could have been 
> introduced by the same script in other files.

> This PR seems to have been abondoned, so perhaps this discussion belongs in 
> https://github.com/openjdk/jdk/pull/15694 ?

Oh, I didn't notice this was supplanted by another PR. It might be better to 
continue there, yes. Even if closed PRs seldom are the best places to conduct 
discussions, I think it might be a good idea to scrutinize all files modified 
by this script.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/12726#discussion_r2036820765

Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6]

Reply via email to