therealppk opened a new issue #2123:
URL: https://github.com/apache/hudi/issues/2123


   The data type of the timestamp columns are flawed inspite of being correct 
in the DataFrame.
   The schema of the DataFrame is:
   ```
   root
    |-- branch_id: long (nullable = true)
    |-- comment: string (nullable = true)
    |-- created_at: timestamp (nullable = true)
   ...
   ```
   The output of the DataFrame show():
   ```
   --------------+--------------------+-----------------------+
   |branch_id|             comment|         created_at|
   --------------+--------------------+-----------------------+
   |    13501|                    |2017-05-09 08:21:35|
   |    14081|                    |2017-05-09 08:53:29|
   ...
   --------------+--------------------+-----------------------+
   ```
   
   The output in Athena after storing the DataFrame in hudi format.
   ```
   --------------+--------------------+-----------------------+
   |branch_id|             comment|         created_at|
   --------------+--------------------+-----------------------+
   |    13501|                    | +49134-01-07 05:30:00.000|
   |    14081|                    | +49153-08-06 07:20:00.000|
   ...
   --------------+--------------------+-----------------------+
   ```
   
   Code to write the DataFrame "main_df" in Hudi format:
   ```
   hudi_options = {
               'hoodie.table.name': table.name,
               'hoodie.datasource.write.recordkey.field': table.primary_key,
               'hoodie.datasource.write.partitionpath.field': partition_by,
               'hoodie.datasource.write.table.name': table.name,
               'hoodie.datasource.write.operation': "upsert",
               'hoodie.datasource.write.precombine.field': "ts_ms",
               'hoodie.upsert.shuffle.parallelism': 2,
               'hoodie.insert.shuffle.parallelism': 2
           }
   
   main_df.write.format("hudi"). \
               options(**hudi_options). \
               mode("append"). \
               save(desturl)
   ```
   The Issue is that Athena recognises int96 at timestamps and not int64 which 
is given by hudi. What is the fix for this? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to