[
https://issues.apache.org/jira/browse/HUDI-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-3490:
--------------------------------------
Fix Version/s: 0.11.0
> Timestamp conversion (parquet)
> ------------------------------
>
> Key: HUDI-3490
> URL: https://issues.apache.org/jira/browse/HUDI-3490
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Istvan Darvas
> Priority: Major
> Fix For: 0.11.0
>
>
> Hi Guys!
>
> My Env is Hudi 0.8.0 AWS EMR 6.4
>
> It seems timestamp conversion is very confusing and not deterministic across
> the tools.
> 1.) for me it seems Delta Streamer default is TIMESTAMP_MILLIS
> 2.) PySpark/HUDI API is TIMESTAMP_MICROS
>
> but the real issue for me is, I cannot control this.
>
> Neither in DeltaStremer:
> --hoodie-conf hoodie.parquet.outputtimestamptype=TIMESTAMP_MICROS
> Nor in PySpark
> {"hoodie.parquet.outputtimestamptype": "TIMESTAMP_MILLIS"}
>
> So I am not able to set a default for me accross systems. ofcourse I can
> convert it myself and I will do it as a workaround, but it would be greate to
> have this convenient feture.
>
> One more suggestion / idea:
> I do not know it is possible or not, but maybe this parameter
> (hoodie.parquet.outputtimestamptype) could be removed from everywhere, and
> the framework could use the high level contract from the spark framework.
> Wich is
> spark.sql.parquet.outputTimestampType = TIMESTAMP_MILLIS / TIMESTAMP_MICROS
> the storage is INT96, which is not compatible with avro, but here I think
> you could do some atomatic conversion which would be well documented :)
>
> Summarized, I am confused and I am not able to use the automatic conversion
> of the timestamps across the systems. So this should be standardized.
>
> Thanks,
> Darvi
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)