On Tue, 5 Aug 2025 08:31:31 GMT, Volkan Yazici <vyaz...@openjdk.org> wrote:
>> Fix `HKSCS` encoder to correctly set the replacement character, and add >> tests to verify the `CodingErrorAction.REPLACE` behavior of all available >> encoders. > > test/jdk/sun/nio/cs/TestEncoderReplaceUTF16.java line 140: > >> 138: * Finds an {@linkplain CoderResult#isUnmappable() unmappable} >> non-Latin-1 {@code char[]} for the given encoder. >> 139: */ >> 140: private static char[] findUnmappableNonLatin1(CharsetEncoder >> encoder) { > > I'd appreciate it if you can double-check this method. I would assume your "double char" actually means the "surrogate pair"? I believe for the first pass of scanning you might want to skip the 'surrogate", as a single dangling surrogate char should trigger a "malformed" error, instead of 'unmappable", if the charset is implemented to handle supplementary character. for (char c = 0xFF; c < 0xFFFF; c++) { if (Character.isSurrogate(c)) continue; if (!encoder.canEncode(c)) return new char[]{c}; } And for the second pass for the 'surrogates", I think we can just pick any non-bmp panel, which should always be translated into a surrogate pair and check if the charset can map/encode it, if not, it's our candidate. for (int i = 0x10000; i < 0x1FFFF; i++) { char[] cc = Character.toChars(i); if (!encoder.canEncode(new String(cc))) return cc; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2255682596