On 2009-02-11, Torsten Curdt <tcu...@apache.org> wrote:

> I am also not so sure this really all that bad. I guess there are 3 scenarios

> 1: the archive standard is known to use a specific encoding
> 2: the encoding is specified inside the archive (which is similar to 1)
> 3: we have no clue about the encoding of the strings in the archive

> Unless I am missing something we are fine for 1+2 because as long as
> we create the strings as we should. It's up to the user of compress to
> turn this into something he can use on his platform.

> For 3 there is just no point. All we can do is provide a way to get
> the name and the user needs to figure out the conversion. Nothing we
> can do about it.

> So I guess we all we need to do is to be sure not to create Strings in
> the default encoding for 1+2.

> Or what am I missing?

Not much, except that java.uti.zip.Zip*putStream and thus the "old"
ZipArchiveOutputStream always are in your state 1: UTF-8.  This also
means they are unable to create or read anything but UTF-8.

The new ZipArchiveOutputStream uses the platform's default which makes
your case 3 more likely unless people take care to note the encoding
when creating the archive.

Note that more modern ZIP implementations provide a way to explicitly
say "this is UTF-8" inside the archive and SANDBOX-176 contains a
patch that claims to support that (as does
<https://issues.apache.org/bugzilla/show_bug.cgi?id=45548>) - I'll be
looking into it.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to