[ 
https://issues.apache.org/jira/browse/TIKA-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565200#comment-17565200
 ] 

Luís Filipe Nassif commented on TIKA-3815:
------------------------------------------

Ok we should at least change this SimpleDateFormat:

[https://github.com/apache/tika/blob/2.4.1/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-image-module/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java#L381]

to set its timezone explicitly to GMT, otherwise it will use the default/local 
timezone when formatting the Date (that uses UTC as reference) even if the 
output pattern has no timezone information to print. Without this, running Tika 
on different timezones could return different date values...

 

I'll submit a fix to that if there aren't objections.

> Inconsistent Date/Time information extracted from Exif data
> -----------------------------------------------------------
>
>                 Key: TIKA-3815
>                 URL: https://issues.apache.org/jira/browse/TIKA-3815
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.4.1
>            Reporter: Luís Filipe Nassif
>            Priority: Major
>         Attachments: IMG_20220616_111848_HDR.jpg
>
>
> Running tika-app-2.4.1.jar on the attached image, these metadata is returned:
> Exif IFD0:Date/Time: 2022:06:16 11:18:49
> Exif SubIFD:Date/Time Digitized: 2022:06:16 11:18:49
> Exif SubIFD:Date/Time Original: 2022:06:16 11:18:49
> Exif SubIFD:Time Zone: -03:00
> Exif SubIFD:Time Zone Digitized: -03:00
> Exif SubIFD:Time Zone Original: -03:00
> File Modified Date: Thu Jun 16 11:18:50 -03:00 2022
> GPS:GPS Date Stamp: 2022:06:16
> GPS:GPS Time-Stamp: 14:18:47.000 UTC
> dcterms:created: 2022-06-16T08:18:49
> dcterms:modified: 2022-06-16T08:18:49
> exif:DateTimeOriginal: 2022-06-16T08:18:49
>  
> The right value is 2022-06-16T14:18:49Z. Although there is no timezone 
> specified for some values, I think it makes no sense converting them to 
> timezones different than GMT, the one used to take the picture (-03:00) or 
> the one used to run the application (-03:00), so Tika could be making an 
> incorrect timezone conversion on the last 3 fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to