Re: [IO-842] Deprecate more or undeprecate non-Charset methods

Elliotte Rusty Harold Fri, 26 Jan 2024 06:15:47 -0800

Except in Java 18 we do need it, for two independent reasons:

1. UTF-8 is still not the guaranteed, runtime character set that the
various methods will use. JDKs can be configured to use a different
default character set. Bugs from incorrect default character set will
now be even harder to find since they won't be as obviously
reproducible on all systems with a particular JDK.

2. Even if UTF-8 were the guaranteed, runtime character set that the
various methods will use, that doesn't make UTF-8 correct. It depends
on the input you're reading and the relevant specifications. Some of
these use UTF-8. Some of these use ASCII or ISO 8859-1. A few use
UTF-16 or something else. Just because the default character set is
UTF-8, does not make any particular file or stream magically UTF-8. It
is necessary to consider the context of the input source and choose
the character encoding that is appropriate for that one source.

We know from decades of experience that default character sets are
unsafe and buggy. The safest approach is to provide higher level
libraries that only accept byte streams as input and do character set
conversion themselves according to spec. This is how JSON and XML
parsers usually operate. But that's not always possible, and when it
isn't, the most secure and bug-resistant API requires developers to
think about their choice of character encoding and make their choice
explicit.

On Thu, Jan 25, 2024 at 5:37 PM Rob Tompkins <chtom...@gmail.com> wrote:
>
> I think we should remove the deprecations and add more explicit Javadocs that 
> spell out that there are oddities with the defaultCharset() depending upon 
> the operating system. Note this problem has been in existence since Java 1.4, 
> and we did nothing about it for a considerable amount of time. Plus in Java 
> 18 forward we simply don’t need it, as Gary said.
>
> Cheers,
> -Rob
>
> > On Jan 24, 2024, at 1:45 PM, Gary D. Gregory <ggreg...@apache.org> wrote:
> >
> > Hi All,
> >
> > In the context of https://issues.apache.org/jira/browse/IO-842 and in light 
> > of UTF-8 being the default Charset for Java 18 and up on all platforms -- 
> > https://openjdk.org/jeps/400 --, we need to figure out whether to:
> >
> > - Deprecate all non-Charset methods in favor of their Charset versions, or
> > - Un-deprecate existing deprecated non-Charset methods.
> >
> > See the ticket, please reply there or here as convenient.
> >
> > Gary
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

-- 
Elliotte Rusty Harold
elh...@ibiblio.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [IO-842] Deprecate more or undeprecate non-Charset methods

Reply via email to