On Wed, 6 Aug 2025 09:58:16 GMT, Volkan Yazici <vyaz...@openjdk.org> wrote:
>> we definitely want to exclude 'some' charsets here. yes, all unicode >> variants probably should be excluded, as they are expected to have a >> 'mapping' for every unicode character. Additionally, many charsets have an >> "internal status", meaning they might shift in and shift out its status >> based on input. See https://www.rfc-editor.org/rfc/rfc1468.html for an >> example. The encoder might/should add the shift-in/out escape sequence >> characters on top of the 'replacement', if the replacement character's >> target sub-charset does not match the 'existing' sub-charset. i would assume >> this is really out of the scope of this pr though :-) > >> if the replacement character's target sub-charset does not match the >> 'existing' sub-charset > > In such a `replacement`, does `CharsetEncoder::isLegalReplacement` still > return `true`? it's 'tricky' :-) some charsets have a default initial status, ascii-charset for example, this might trigger false return if the replacement is set without the appropriate shift-in/out esc-seq when target-sub-charset and existing charset are not matched. I'm not confident that all our implementations really handle it correctly :-) it might be interested (not really :-) given these charsets might not be that important these days and it's rare people try to change the default replacement bytes) to do full-scan-check, but again probably is not in-scope of this change. iso2022-jp is one such charset. we attempt to shift-in to the correct sub-charset by keeping the requested mode in **_implReplaceWith_** protected void implReplaceWith(byte[] newReplacement) { /* It's almost impossible to decide which charset they belong to. The best thing we can do here is to "guess" based on the length of newReplacement. */ if (newReplacement.length == 1) { replaceMode = ASCII; } else if (newReplacement.length == 2) { replaceMode = JISX0208_1983; } } then during encoding if (unmappableCharacterAction() == CodingErrorAction.REPLACE && currentMode != replaceMode) { if (dl - dp < 3) return CoderResult.OVERFLOW; if (replaceMode == ASCII) { da[dp++] = (byte)0x1b; da[dp++] = (byte)0x28; da[dp++] = (byte)0x42; } else { da[dp++] = (byte)0x1b; da[dp++] = (byte)0x24; da[dp++] = (byte)0x42; } currentMode = replaceMode; } i believe this might not be really bulletproved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2257966394