On 2018-02-27, Allison, Timothy B. wrote:

>    On TIKA-2591[0], a user reports that a specific type of TIFF is
>    being identified as a TAR file.  Is this something we should try to
>    fix at the Tika level, or is this something that would be better
>    fixed in COMPRESS?

TAR auto-detection is, erm, clumsy. But this is due to the format not
being built for being detected.

This is how it works right now:

* read the first candidate header of 512 bytes

* look at the eight bytes that contain the "ustar" string and the
  version and verify they look like something we support.

* verify the checksum of the candidate tar header

It is extremely unlikely that you find a file that contains the literal
"ustar" and a bunch of NULs and also a marching checksum at the right
places, but you seem to have found one.

Of course it is possible we've got a bug, so we should look at the TIFF
file and verify it really looks like a TAR.  If there is no bug I'm not
sure what else we could do - or what TIKA could do.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to