[ https://issues.apache.org/jira/browse/TIKA-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783406#comment-17783406 ]
Tim Allison commented on TIKA-4167: ----------------------------------- This is the expected behavior. As you point out, most modern illustrator files are subtypes of pdf. The pdf parser identifies illustrator features and updates the mime type to illustrator. The content-type-override is used to bypass the detector in selecting the parser. We could prevent this behavior, but I’d like to know more about the use case. There are other parsers that refine the content-type based on a parse of the file. > CONTENT_TYPE_USER_OVERRIDE doesn't force content type for > application/illustrator files > --------------------------------------------------------------------------------------- > > Key: TIKA-4167 > URL: https://issues.apache.org/jira/browse/TIKA-4167 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 2.9.1 > Reporter: Sam Stephens > Priority: Minor > > When I parse a file using AutoDetectParser, with Metadata set to > {color:#ce9178}{TikaCoreProperties.CONTENT_TYPE_USER_OVERRIDE: > "application/pdf"}{color} > and parse [a PDF-like Illustrator > file|[https://github.com/apache/tika/blob/78be82565df4cc3bbc88308be8d686019a10b899/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/test/resources/test-documents/testPDF_AdobeIllustrator.pdf],] > the "Content-Type" in the returned metadata is "application/illustrator", > not "application/pdf". > I think this is happening because "application/illustrator" is a subtype of > "application/pdf". -- This message was sent by Atlassian Jira (v8.20.10#820010)