Hi Stefan,

  My comments follow.

Stefan Bodewig schrieb:
> Let me try to capture the various threads in SANDBOX-176 and from this
> list into something we can draw conclusions from.
> 
> First some background:
> ======================

[snip]

> Reading
> =======
> 
> Let's keep ZipArchiveInputStream out of the discussion for now 8-)

Yes, we should do so. I analysed my winzip example and recognized, that unicode
extra fields are written to the central directory records and not to the local
file headers. This makes it impossible to get the real Unicode filename when
parsing a ZIP file in the way as all ZipInputStream implementations I've seen
do. (They sequentially parse the local file headers and ignore the central
directory records...)

Furthermore, relicensing of any GPL-version of java.util.zip.ZipInputStream
version seems to be impossible, because of the large number of contributors to
the code out there. (I've tried to find the contributors to GNU classpath'
version, there's nearly no possiblity to find them all...)

> I propose to change ZipFile to support both the EFS flag as well as
> the InfoZIP extra fields when reading archives.

That's a good choice. I'Ve already provided the parsing code for unicode extrra
fields, so the implementation should be quite easy ;-)

> I'm not sure what ZipFile should do if it encounters both the EFS flag
> and the extra fields.  Likely it is best to assume both hold the same
> information and simply use the EFS encoded name.

Agreed.

> The question is what ZipFile should assume as its default if neither
> the EFS nor extra fields are present.  This can be controlled by
> "setEncoding" right now and defaults to the platform's default
> encoding but a default of UTF-8 (compatible with java.util.zip) or
> CodePage 437 (compatible with formal ZIP spec) are valid choices as
> well.

AFAIKS, ant API user are used to the 'setEncoding(String encoding)' approach
although it yould be better to rename the method to 'setDefaultEncoding(String
encoding)'.

> Writing
> =======
> 
> I propose new flags get/setLanguageEncodingFlag for EFS and
> get/setAddUnicodeExtraFields on ZipArchiveOutputStream that control
> whether either approach is used.  I.e. I propose to optionally support
> either approach (and both at the same time).

The question at this point is, whther to us the EFS flag for *all* records* or
only for records not encodable by the encoding set by 'setEncoding(String)'.

IMHO we should tke over the 7-zip approach and set the EFS flag only for
not-encodable records, since this approch is mininimally invasive.

Surely the EFS flag should be set for all records, if the encoding is set to 
utf-8.

> IMHO the main question is what the code should do by default.
> 
> Currently I think the best default approach would be to use UTF-8 as
> the default encoding and set the EFS bit since this will create
> archives compatible with java.util.zip but has the additional benefit
> of clearly stating it is using UTF-8.

Yes, this seems to be reasonable, because users will expect JAVA-compatibility
in the first instance.

> Note that using the EFS bit may make the archive unreadable for old
> archivers, that's why we need the option to turn it off.

I've not seen an old archiver you refused to unpack such a file. The only
problem is, that the file names of the unpacked files are wrong. (utf-8
interpreted as CP437, the good news is: All codepoints from 0x80-0xff in CP437
are allocated) However, that's the same problem as arises when unpacking a file
created by java.util.zip.ZipOutputStream.

> I wouldn't add the InfoZIP extra fields by default since they increase
> the archve size.

Yes, that' good so.

How about my suggestion for a 'tuning' method, sets up the ZipOutputStream in a
way, that's suitable for most unzip tools out in the wild?

Or sould we gather all the knowledge we gathered in SANDBOX-176 an in this
thread into the JavaDoc of the class ?

  Regards,

    Wolfgang


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to