Re: RFR: 8195686: ISO-8859-8-i charset cannot be decoded, should be mapped to ISO-8859-8

Justin Lu Thu, 10 Oct 2024 09:54:40 -0700

On Thu, 10 Oct 2024 16:32:48 GMT, Naoto Sato <na...@openjdk.org> wrote:


> Sorry, but I still don't believe that making "ISO-8859-8-I" as an alias to 
> "ISO-8859-8" is the right solution, per the IANA character sets definition 
> (https://www.iana.org/assignments/character-sets/character-sets.xhtml). The 
> current PR would make "ISO-8859-8-I" charset appear in 
> `Charset.forName("ISO-8859-8").aliases()`, but not in 
> `Charset.availableCharsets()` which is deemed incorrect to me.


I agree. From the Charset specification,

> If a charset listed in the IANA Charset Registry is supported by an 
> implementation of the Java platform then its canonical name must be the name 
> listed in the registry. Many charsets are given more than one name in the 
> registry, in which case the registry identifies one of the names as 
> MIME-preferred. If a charset has more than one registry name then its 
> canonical name must be the MIME-preferred name and the other names in the 
> registry must be valid aliases.

Practically speaking it does seem to be a alias, but implementing as such would 
violate the Charset specification. So either defining as a new Charset for 
ISO-8859-8-I (if there is sufficient demand) or as Naoto pointed out, utilize 
the CharsetProvider would seem like appropriate solutions to me. A pro to the 
SPI solution is that you can also easily include all the other bidi supported 
implicit/explicit Charsets as well.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20690#issuecomment-2405607186

Re: RFR: 8195686: ISO-8859-8-i charset cannot be decoded, should be mapped to ISO-8859-8

Reply via email to