[ 
https://issues.apache.org/jira/browse/NIFI-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883339#comment-17883339
 ] 

Daniel Stieglitz commented on NIFI-13744:
-----------------------------------------

[~john.wise] I just submitted a PR to fix this issue. This behavior is now akin 
to the behavior seen in CSVReader and JsonTreeReader. 
I would just like to point out that with the data supplied and the date and 
time patterns there are errors even in CSVReader and JsonTreeReader which you 
had included in the template. Using what the Excel produced I created the 
following CSV  
{code:java}
Date_Standard,Date_Custom,Time_Standard,Time_Custom,Timestamp
7/24/2024,07/25/24,04:00:00 PM,16:00,07/26/2024 16:00:00 
{code}
and JSON
{code:java}
{
   "Date_Standard" : "7/24/2024",
   "Date_Custom" : "07/25/24",
   "Time_Standard" : "04:00:00 PM",
   "Time_Custom" : "16:00",
   "Timestamp" : "07/26/2024 16:00:00"
}
{code}

The CSVReader, JsonTreeReader and now with my fix for ExcelReader produce the 
similar error when the schema is supplied and the Date, Time and Timestamp 
formats are specified in any of these controller services. 
{code:java}
Caused by: java.time.format.DateTimeParseException: Text '7/24/24' could not be 
parsed at index 0
        at 
java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2108)
        at 
java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:2010)
        at java.base/java.time.LocalDate.parse(LocalDate.java:435)
        at 
org.apache.nifi.serialization.record.util.DataTypeUtils.parseLocalDate(DataTypeUtils.java:1239)
        at 
org.apache.nifi.serialization.record.util.DataTypeUtils.toLocalDate(DataTypeUtils.java:1185)
{code}

The reason this is failing is due to the single month digit in '7/24/24' while 
your date pattern is 'MM/dd/yyyy'. For the  java.time.format.DateTimeFormatter 
in order to handle both single and double digit months a single 'M' needs to be 
specified. I found out this the hard way when I had to address NIFI-13557. 
Now even if you fix this and specify the date pattern as 'M/dd/yyyy' all 
readers will fail when reading the "Date_Custom" field since its specified as a 
date in the schema and the only way all of the controller services can convert 
it to a date is with the use of the date pattern which will not match 
"07/25/24". The similar problem exists for the "Time_Standard" and 
"Time_Custom" fields. The controller services need the time pattern to convert 
it. Whatever pattern you specify cannot satisfy both. So bottom line is if you 
have have multiple date (or time or timestamp) formats in your data and you 
specify a schema then only specify the type for one of them and a corresponding 
date pattern while for the other(s) specify them as strings.
The same goes for inferring. In order to interpret them as a date, time or a 
timestamp those corresponding patterns must be supplied in the controller 
service. The controller services you have in the attached template do not 
specify any date, time and timestamp patters hence all fields are inferred as 
strings. Just note though in your sample data one of the date and times will be 
inferred as such and the others as strings.




> ExcelReader time conversion issues
> ----------------------------------
>
>                 Key: NIFI-13744
>                 URL: https://issues.apache.org/jira/browse/NIFI-13744
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.27.0, 2.0.0-M5
>         Environment: RHEL 8.10, macOS Sequoia 15
>            Reporter: John Wise
>            Assignee: Daniel Stieglitz
>            Priority: Major
>              Labels: Bug, Reader
>         Attachments: Excel_Reader_-_Time_Conversion_Issues.xml, test.xlsx
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> [~dstiegli1] - There are actually two issues with time conversion in 
> ExcelReader.  I'm using the following sample set of data to illustrate both 
> issues:
> {code:java}
> Date_Standard: "07/24/24",
> Date_Custom: "07/25/24",
> Time_Standard: "4:00:00 PM",
> Time_Custom: "16:00:00",
> Timestamp: "07/26/24 16:00:00"{code}
> 1. When inferring the schema, all of the Avro types are incorrectly 
> identified as strings:
> {code:java}
> {
>   "type":"record",
>   "name":"nifiRecord",
>   "namespace":"org.apache.nifi",
>   "fields":[
>     {"name":"column_0","type":["string","null"]},
>     {"name":"column_1","type":["string","null"]},
>     {"name":"column_2","type":["string","null"]},
>     {"name":"column_3","type":["string","null"]},
>     {"name":"column_4","type":["string","null"]}
>   ]
> }{code}
> The output consists solely of epoch strings; the fourth value, 
> {{{}"-2208999600000"{}}}, appears to be incorrect, since it isn't inline with 
> the other values at all.
> *Inferred output:*
> {code:java}
> [ {
>   "column_0" : "1721793600000",
>   "column_1" : "1721880000000",
>   "column_2" : "1721851200000",
>   "column_3" : "-2208999600000",
>   "column_4" : "1722024000000"
> } ]{code}
>  
> 2. The second issue occurs when the Avro schema is provided:
> {code:java}
> {
>   "type": "record",
>   "name": "nifiRecord",
>   "namespace": "org.apache.nifi",
>   "fields": [
>     {
>       "name": "Date_Standard",
>       "type": [ "null", { "type": "int", "logicalType": "date" } ]
>     },
>     {
>       "name": "Date_Custom",
>       "type": [ "null", { "type": "int", "logicalType": "date" } ]
>     },
>     {
>       "name": "Time_Standard",
>       "type": [ "null", { "type": "int", "logicalType": "time-millis" } ]
>     },
>     {
>       "name": "Time_Custom",
>       "type": [ "null", { "type": "long", "logicalType": "time-micros" } ]
>     },
>     {
>       "name": "Timestamp",
>       "type": [ "null", { "type": "long", "logicalType": "timestamp-millis" } 
> ]
>     }
>   ]
> } 
> {code}
> Conversion of the {{time-millis}} and {{time-micros}} fields both fail with 
> errors similar to this:
> {code:java}
> • 18:01:24 EDT ERROR
> ConvertRecord[id=a098dabc-0191-1000-6d17-3aaa911b2130] Failed to process 
> FlowFile [filename=test.xIsx]; will route to failure: 
> org.apache.nifi.processor.exception.ProcessException: Could not parse 
> incoming data
> - Caused by: org.apache.nifi.serialization.MalformedRecordException: Read 
> next Record from Excel XLSX failed
> - Caused by: 
> org.apache.nifi.serialization.record.util.IllegalTypeConversionException: 
> Cannot convert value [Sun Dec 31 16:00:00 EST 1899] of type class 
> java.util.Date to Time for field Time_Custom
> {code}
> Changing the failed types to "string" results in order to allow conversion of 
> the other values results in an epoch output for those values:
> {code:java}
> [ {
>   "Date_Standard" : "07/24/2024",
>   "Date_Custom" : "07/25/2024",
>   "Time_Standard" : "1721851200000",
>   "Time_Custom" : "-2208999600000",
>   "Timestamp" : "07/26/2024 16:00:00"
> } ]{code}
>  
> Given the same data in both JSON and CSV formats, with corresponding Readers, 
> both the inferred and schema-provided outputs are as expected.  This appears 
> to be an issue in ExcelReader.
> I've attached the spreadsheet & a template of the NiFi flow that I've been 
> troubleshooting this with.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to