On 2018-02-27, Allison, Timothy B. wrote: > On TIKA-2591[0], a user reports that a specific type of TIFF is > being identified as a TAR file. Is this something we should try to > fix at the Tika level, or is this something that would be better > fixed in COMPRESS?
TAR auto-detection is, erm, clumsy. But this is due to the format not being built for being detected. This is how it works right now: * read the first candidate header of 512 bytes * look at the eight bytes that contain the "ustar" string and the version and verify they look like something we support. * verify the checksum of the candidate tar header It is extremely unlikely that you find a file that contains the literal "ustar" and a bunch of NULs and also a marching checksum at the right places, but you seem to have found one. Of course it is possible we've got a bug, so we should look at the TIFF file and verify it really looks like a TAR. If there is no bug I'm not sure what else we could do - or what TIKA could do. Stefan --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org