On Thu, 10 Apr 2025 08:08:02 GMT, Eirik Bjørsnøs <eir...@openjdk.org> wrote:
>> If anything, I might be a bit worried that there are more incorrect >> conversions stemming from this PR, that my automated tools and manual >> scanning has not revealed. > > Some observations: > > 1: This PR seems to have been abondoned, so perhaps this discussion belongs > in #15694 ? > > 2: The `å` (Unicode 'Latin small letter a with ring above' U+00E5) was > correctly encoded as 0xEF in ISO-8859-1 previous to this change. > > 3: The conversion changed this `0xEF` to the three-byte sequence `ef bf bd` > > 4: This is as-if the file was incorrctly decoded using UTF-8, then encoded > using UTF-8: > > > byte[] origBytes = "å".getBytes(StandardCharsets.ISO_8859_1); > String decoded = new String(origBytes, StandardCharsets.UTF_8); > byte[] encoded = decoded.getBytes(StandardCharsets.UTF_8); > String hex = HexFormat.of().formatHex(encoded); > assertEquals("efbfbd", hex); > ``` > > Like @magicus I'm worried that similar incorrect decoding could have been > introduced by the same script in other files. > This PR seems to have been abondoned, so perhaps this discussion belongs in > https://github.com/openjdk/jdk/pull/15694 ? Oh, I didn't notice this was supplanted by another PR. It might be better to continue there, yes. Even if closed PRs seldom are the best places to conduct discussions, I think it might be a good idea to scrutinize all files modified by this script. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12726#discussion_r2036820765