[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923367#comment-17923367 ]
Tim Allison commented on TIKA-4375: ----------------------------------- Y, the bmp thing is weird... {{New BMP version not implemented yet.}} This zip file has most of the bmps that caused problems: {{commoncrawl3/JK/JKMFT7XDUF7VRB6WH4D6ECD6DE6MX32T}}. It is trivially reproducible. I'll take a look. The json, I'm not as concerned with because we have a hard time detecting json without a filename hint. The encoding difference (which I acknowledge is wrong) comes in with the updated encoding detector. I don't like it, but I'm not sure there's much we can do. > Regression tests for 2.9.3 release > ---------------------------------- > > Key: TIKA-4375 > URL: https://issues.apache.org/jira/browse/TIKA-4375 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > Attachments: 43R5U3BXJUDJXDZ25OAE33ZU47362WLV.zip, > LTWA2JGVJGJ5RVKHTUX6SDS4NTL5UJVQ-p139.pdf, RYT4H6OCPKZPFG3YK5PGLETS6Q3SBUDV, > reports-tika-2.9.3-rc1.tgz, tika-2.9.2-v-tika-2.9.3-reports.tgz > > -- This message was sent by Atlassian Jira (v8.20.10#820010)