On 2018-02-27, Stefan Bodewig wrote: > On 2018-02-27, Allison, Timothy B. wrote:
>> On TIKA-2591[0], a user reports that a specific type of TIFF is >> being identified as a TAR file. Is this something we should try to >> fix at the Tika level, or is this something that would be better >> fixed in COMPRESS? > TAR auto-detection is, erm, clumsy. But this is due to the format not > being built for being detected. > This is how it works right now: > * read the first candidate header of 512 bytes > * look at the eight bytes that contain the "ustar" string and the > version and verify they look like something we support. > * verify the checksum of the candidate tar header Actually I was mis-reading the code. It is either "ustar and version look good" or "parses as tar header with correct checksum". So the chance for false positives is bigger. Unfortunately this has proven necessary to detect all valid TAR archives: https://issues.apache.org/jira/browse/COMPRESS-117 Stefan --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org