[ 
https://issues.apache.org/jira/browse/TIKA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575279#comment-17575279
 ] 

Tim Allison commented on TIKA-3827:
-----------------------------------

My guess is that aspose is doing the correct post-processing to yield a png.  
This happens commonly in PDFs, where the raw bytes for an image that are stored 
have to be manipulated by the application along with other information in the 
document to yield an actual image file.

Are you able to attach the pngs?  I'm curious if they just slapped a png header 
on those bytes or if they were actually transformed.

> Word Document extracted mpga file extension instead of bitmap 
> --------------------------------------------------------------
>
>                 Key: TIKA-3827
>                 URL: https://issues.apache.org/jira/browse/TIKA-3827
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Tika User
>            Priority: Major
>         Attachments: Screenshot from 2022-08-04 06-05-09.png, example.DOC, 
> file_1.bmp, file_2.bmp, image-2022-08-04-10-52-44-800.png, 
> image-2022-08-04-10-53-48-894.png, image-2022-08-04-15-44-48-396.png, 
> image-2022-08-04-15-45-10-892.png
>
>
> When tried to parser the .doc document it is extracted two mpga files which 
> can't be open to play. We are suspecting they should be bitmap image files. 
> The Tika version we are using is 2.4.1.
> [^example.DOC]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to