Adrian Bird created TIKA-4475:
---------------------------------

             Summary: ExifTool does not work with Tika on Windows
                 Key: TIKA-4475
                 URL: https://issues.apache.org/jira/browse/TIKA-4475
             Project: Tika
          Issue Type: Bug
    Affects Versions: 3.2.2
         Environment: Windows 11 with Tika 3.2.2.
            Reporter: Adrian Bird


I tried to get Tika and ExifTool to work together to process some JPEG image 
files and came across a number of issues. The issues are listed here, with a 
description of what I did below:
1) Tika and ExifTool doesn't work on Windows
I used the [Wiki 
page|https://cwiki.apache.org/confluence/display/TIKA/EXIFToolParser] to 
understand how to do the integration.
Because I wasn't getting the metadata I expected, I used the '--verbose' option 
and got a Java Exception which contained this text:
 "WARN  [main] 07:13:34,699 org.apache.tika.parser.external.ExternalParser 
problem with process exec
java.io.IOException: Cannot run program "env": CreateProcess error=2, The 
system cannot find the file specified"
The exception occurs because 'env' is not a valid Windows command.
I tracked this down to the file 
'org\apache\tika\parser\external\tika-external-parsers.xml' in the Tika App jar 
where the command is:
'<command>env FOO=${OUTPUT} exiftool ${INPUT}</command>'
This doesn't work on Windows because 'env' does not exist.

As a test I changed the command and updated my copy of the Tika-App.jar and it 
then worked with a video file. The changed command was:
'<command>exiftool ${INPUT}</command>'

2) For the same reason Tika and sox won't work on Windows
The command is:
<command>env FOO=${OUTPUT} sox --info ${INPUT}</command>
Note I didn't find any information on 'sox' on the Wiki.

3) Looking at the file 
'org\apache\tika\parser\external\tika-external-parsers.xml' I noticed that it 
only contains video related mime-types, meaning that I cannot use it with image 
files. The Wiki page says:
'EXIFTool is a wonderful tool that reads videos, images, audio and other media 
files and that extracts EXIF metadata from them.' 
I took this to mean that Tika can extract metadata form all 3 file types, but 
that isn't the case as it only supports video files. 
Given this can I suggest that the Wiki page should be updated to make this 
clear.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to