[ 
https://issues.apache.org/jira/browse/HUDI-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3490:
--------------------------------------
    Fix Version/s: 0.11.0

> Timestamp conversion (parquet)
> ------------------------------
>
>                 Key: HUDI-3490
>                 URL: https://issues.apache.org/jira/browse/HUDI-3490
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Istvan Darvas
>            Priority: Major
>             Fix For: 0.11.0
>
>
> Hi Guys!
>  
> My Env is Hudi 0.8.0 AWS EMR 6.4
>  
> It seems timestamp conversion is very confusing and not deterministic across 
> the tools.
> 1.) for me it seems Delta Streamer default is TIMESTAMP_MILLIS
> 2.) PySpark/HUDI API is TIMESTAMP_MICROS
>  
> but the real issue for me is, I cannot control this.
>  
> Neither in DeltaStremer:
>  --hoodie-conf hoodie.parquet.outputtimestamptype=TIMESTAMP_MICROS
> Nor in PySpark
> {"hoodie.parquet.outputtimestamptype": "TIMESTAMP_MILLIS"}
>  
> So I am not able to set a default for me accross systems. ofcourse I can 
> convert it myself and I will do it as a workaround, but it would be greate to 
> have this convenient feture.
>  
> One more suggestion / idea:
> I do not know it is possible or not, but maybe this parameter 
> (hoodie.parquet.outputtimestamptype) could be removed from everywhere, and 
> the framework could use the high level contract from the spark framework. 
> Wich is
>    spark.sql.parquet.outputTimestampType = TIMESTAMP_MILLIS / TIMESTAMP_MICROS
>    the storage is INT96, which is not compatible with avro, but here I think 
> you could do some atomatic conversion which would be well documented :)
>  
> Summarized, I am confused and I am not able to use the automatic conversion 
> of the timestamps across the systems. So this should be standardized.
>  
> Thanks,
>  Darvi
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to