[ 
https://issues.apache.org/jira/browse/TIKA-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154103#comment-13154103
 ] 

Nick Burch commented on TIKA-786:
---------------------------------

In r1204435, I've added some failing+disabled unit tests for this. If you 
re-enable the tests on lines 81-83 and 127-129, you'll see this issue
                
> Tika CLI --detect returns incorrect content-type for files with altered 
> extensions
> ----------------------------------------------------------------------------------
>
>                 Key: TIKA-786
>                 URL: https://issues.apache.org/jira/browse/TIKA-786
>             Project: Tika
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: 1.1
>         Environment: Windows
>            Reporter: John Mastarone
>            Priority: Minor
>
> From a discussion on the user mailing list on Nov. 11 2011, where the 
> following was requested as a new bug: Tika CLI will return incorrect content 
> type information when called with --detect for files that have had their 
> extensions modified (and nothing else).  MS Word (.doc) documents that have 
> their extension changed to .xls or .ppt will be incorrectly detected as Excel 
> or PowerPoint documents, whereas the --metadata option will determine the 
> content type correctly (as application/msword), based on the actual contents 
> of these mis-named files.  The same also occurs with other types of MS Office 
> 2003 documents, and could possibly occur with a wide range of document types. 
>  To quote Nick B., from the user mailing list: "If you look at the 
> TestMediaTypes class you'll see what you can get with just the mime magic and 
> filenames, and then there's TestContainerAwareDetector which shows the 
> correct detection happening by using the extra detectors available".   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to