[ 
https://issues.apache.org/jira/browse/NIFI-14106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929849#comment-17929849
 ] 

Daniel Stieglitz edited comment on NIFI-14106 at 2/24/25 5:06 PM:
------------------------------------------------------------------

The following commits by POI
 * [r1923766|https://svn.apache.org/viewvc?view=rev&rev=1923766]
 * [r1923780|https://svn.apache.org/viewvc?view=rev&rev=1923780]
 * 
[f61ddde|https://github.com/apache/poi/commit/f61dddea125824f6fb496adb6299e08625646c73]

obviates the need for PR [#9178|https://github.com/apache/nifi/pull/9718] as it 
solves both issues of copying dates correctly and avoiding exceeding the number 
of Cell styles when copying (NIFI-13726). We should monitor for the next POI 
release. Once that release is available and incorporated in NIFI, a new PR 
should be made for this ticket to revert the changes made in PR 
[#9466|https://github.com/apache/nifi/pull/9466] to resolve NIFI-13726  to 
allow cell styles to be copied which would enable copying of date formats.


was (Author: JIRAUSER294662):
The following commits by POI
 * [r1923766|https://svn.apache.org/viewvc?view=rev&rev=1923766]
 * [r1923780|https://svn.apache.org/viewvc?view=rev&rev=1923780]
 * 
[f61ddde|https://github.com/apache/poi/commit/f61dddea125824f6fb496adb6299e08625646c73]

obviates the need for PR [#9178|https://github.com/apache/nifi/pull/9718] as it 
solves both issues of copying dates correctly and avoiding exceeding the number 
of Cell styles when copying (NIFI-13726). We should monitor for the next POI 
release. Once that release is available and incorporated in NIFI, the changes 
made in PR [#9466|https://github.com/apache/nifi/pull/9466] to resolve 
NIFI-13726 should be reverted in order to allow cell styles to be copied which 
would enable copying of date formats.

> Loss of dates and times precision when running Excel with Dates through 
> SplitExcel
> ----------------------------------------------------------------------------------
>
>                 Key: NIFI-14106
>                 URL: https://issues.apache.org/jira/browse/NIFI-14106
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Daniel Stieglitz
>            Assignee: Daniel Stieglitz
>            Priority: Major
>         Attachments: fileWithIssue.xlsx
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The SplitExcel processor seems to be dropping date format information needed 
> in order to correctly determine dates and times. I set up two flows sending 
> the attached file to both:
> +Flow without SplitExcel+
> GetFile -> ConvertRecord (configured with ExcelReader with Schema Access 
> Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with Schema 
> Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time Format 
> HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)
> +Flow with SplitExcel+
> GetFile -> SplitExcel -> ConvertRecord (configured with ExcelReader with 
> Schema Access Strategy 'Use Starting Row' and CSVRecordSetWriter (configured 
> with Schema Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, 
> Time Format HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)
> The contents from the flow without SplitExcel correctly outputs the dates and 
> times
> {code:java}
> transaction_id,transaction_date,transaction_time
> 75,01/01/2023,09:53:44
> 78,01/01/2023,09:55:16
> 80,01/01/2023,10:00:39
> 81,01/01/2023,10:03:55
> 82,01/01/2023,10:14:49{code}
> and its schema is
> {code:java}
> {"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":[{"type":"int","logicalType":"date"},"null"]},{"name":"transaction_time","type":[{"type":"int","logicalType":"time-millis"},"null"]}]}{code}
> while the contents from the flow with SplitExcel does not correctly output 
> the dates and times
> {code:java}
> transaction_id,transaction_date,transaction_time
> 75,44927,-1
> 78,44927,-1
> 80,44927,-1
> 81,44927,-1
> 82,44927,-1{code}
> and its schema is
> {code:java}
> {"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":["long","null"]},{"name":"transaction_time","type":["long","null"]}]}{code}
> Even if the schema for the SplitExcel flow is specified as the one produced 
> by flow without the SplitExcel processor and the header from the attached 
> file is removed, the results are not the correct date and times
> {code:java}
> 75,01/01/1970,23:59:59
> 78,01/01/1970,23:59:59
> 80,01/01/1970,23:59:59
> 81,01/01/1970,23:59:59
> 82,01/01/1970,23:59:59{code}
> It would seem when copying the contents from the original tab into a new 
> Excel spreadsheet (with one tab) that for all dates/times/timestamps the 
> correct epoch must be copied over.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to