Hi all 7z archives provide CRCs for the metadata section so you can quickly identify a wide range of broken archives - which is far better than what you get for ZIP for example.
It is possible to recover from a certain type of broken archive. A case where the archive has been written almost completely and just the CRC and the locator of metadata are missing. The docs talk about disks/drives being removed prematurely. The basic idea is to search backwards from the end of the file for the metadata and try to parse it. This is what SevenZFile does and has always done. This is the root cause of https://issues.apache.org/jira/browse/COMPRESS-542 - the file ends with something that looks like metadata of an archive with lots and lots of files in it and the allocation of arrays leads to a OOM. Current master will detect corrupt archives more quickly - in particular without excessive allocations - but still it may take quite some time to reject thousands of candidates of "this could be the first byte of proper meta data". We are scanning the last megabyte of the file and there is ample chance this last megabyte may contain random noise that looks promising. Personally I believe that almost nobody actually needs this mode of recovery. Therefore I've thought we might want to introduce an option that enables the recovery mode. If it was disabled and we found the CRC was missing we'd throw a new specific exception that says "you may want to try with recovery enabled instead". Making this new option default to disabling recovery would break backwards compatibility but it is tempting to think this could be fine. I'm a bit torn here. What do you think? Stefan --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org