If I explain the scenario in more detail then it might become clearer.

I am seeing issues with certain zip files and file format based on zip (such
as docx and zip). We are reading these files from a stream so are using the
ZipArchiveInputStream.

What I see is that we loop around getting each entry with getNextZipEntry
and we get a null and stop. All looks good. However we have only extracted 1
or 2 entries out of a known 20 or 30 entries - the file based extractor
extracts all the file.

I cannot provide an example of a file as the examples I have are all
customer owned. However every xps file I have seen suffers the issue:

http://www.microsoft.com/whdc/xps/xpssampdoc.mspx

I have investigated the issue and it is caused by entries that use the
central directory. What happens in the zip stream reader is that the size,
csize and crc fields are all zero, there is no central directory available
to the reader so it performs no extraction. This means the next loop to
getNextZipEntry is incorrectly positioned and fails checking the entry
signature (LFH_SIG), this returns a null and to the calling code it appears
that we have succeeded.

So my two change requests are simply to enable me to validate entries and
detect these types of stream so I can do something appropriate. With
compress 1.1 there is support to identify encrypted entries which I need and
hence the request to identify entries using the data descriptor.

The second request is to not return a null when this type of error occurs
but indicate the error somehow. There might be issues here (I am no zip
expert) but I would be worried about false errors being reported.

Simon


On 11/03/2010 13:11, "Stefan Bodewig" <bode...@apache.org> wrote:

> On 2010-03-10, Simon Tyler <sty...@mimecast.net> wrote:
> 
>> Do we have a date yet for the compress 1.1 release?
> 
> What Christian said.
> 
>> Also, is there time to add a couple of minor feature enhancements? I could
>> do with access to the following:
> 
>> 1. A public method to check if a ZipArchiveInputStream has a data
>> descriptor (e.g. return hasDataDescriptor).
> 
> This is a property of the individual entry, not the stream as a whole,
> isn't it?  Why would you want to know that (just curious)?  We could
> probably make the general purpose flags available and you could look at
> bit 3.
> 
>> 2. Better handling when ZipArchiveInputStream is used to read such streams.
>> Currently it silently fails when this happens when if hits an invalid
>> LFH_SIG by returning null.
> 
> I'm not sure what you mean, could you describe what happens under what
> circumstances in more detail?  I see that the data descriptor isn't read
> anywhere and I see that the stream may fail if the data descriptor uses
> the "unofficial signature" mentioned in appnote, is this what you mean?
> 
> Stefan
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to