Hi all

7z archives provide CRCs for the metadata section so you can quickly
identify a wide range of broken archives - which is far better than what
you get for ZIP for example.

It is possible to recover from a certain type of broken archive. A case
where the archive has been written almost completely and just the CRC
and the locator of metadata are missing. The docs talk about
disks/drives being removed prematurely.

The basic idea is to search backwards from the end of the file for the
metadata and try to parse it. This is what SevenZFile does and has
always done. This is the root cause of
https://issues.apache.org/jira/browse/COMPRESS-542 - the file ends with
something that looks like metadata of an archive with lots and lots of
files in it and the allocation of arrays leads to a OOM.

Current master will detect corrupt archives more quickly - in particular
without excessive allocations - but still it may take quite some time to
reject thousands of candidates of "this could be the first byte of
proper meta data". We are scanning the last megabyte of the file and
there is ample chance this last megabyte may contain random noise that
looks promising.

Personally I believe that almost nobody actually needs this mode of
recovery.

Therefore I've thought we might want to introduce an option that enables
the recovery mode. If it was disabled and we found the CRC was missing
we'd throw a new specific exception that says "you may want to try with
recovery enabled instead".

Making this new option default to disabling recovery would break
backwards compatibility but it is tempting to think this could be
fine. I'm a bit torn here. What do you think?


Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to