neerajpadarthi opened a new issue, #6232:
URL: https://github.com/apache/hudi/issues/6232
Hi Team,
Using the configs below, I see Hudi is truncating the second precisions
while ingesting the data. We are currently on 0.9V and I have observed this
issue with this version, but it worked with 0.11V.
Do I need to add any other configurations to make it work with 0.9V without
migrating to 0.11V? Any help on how to avoid this issue would be greatly
appreciated.
Configs
db_name = tst_db
tableName =tst_tb
pk = ‘id’
de_dup = ‘last_updated’
commonConfig = {
“hoodie.datasource.hive_sync.database”: db_name,
‘hoodie.table.name’: tableName,
‘hoodie.datasource.hive_sync.support_timestamp’: ‘true’,
‘hoodie.datasource.write.recordkey.field’: pk,
‘hoodie.datasource.write.precombine.field’: de_dup,
‘hoodie.datasource.hive_sync.enable’: ‘true’,
‘hoodie.datasource.hive_sync.table’: tableName
}
nonPartitionConfig = {
‘hoodie.datasource.hive_sync.partition_extractor_class’:
‘org.apache.hudi.hive.NonPartitionedExtractor’,
‘hoodie.datasource.write.keygenerator.class’:
‘org.apache.hudi.keygen.NonpartitionedKeyGenerator’
}
config = {
‘hoodie.bulkinsert.shuffle.parallelism’: 10,
‘hoodie.datasource.write.operation’: ‘bulk_insert’
}
S3Location = ‘s3://<>/hudi/tst_tb’
combinedConf = {**commonConfig, **nonPartitionConfig, **config}
df.write.format(‘org.apache.hudi’).options(
**combinedConf).mode(‘overwrite’).save(S3Location)
Environment Description
EMR: emr-6.5.0
Hudi version : 0.9
Spark version : Spark 3.1.2
Hive version : Hive 3.1.2
Hadoop version :Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no
Source Data
+----------+--------------------------+--------------------------+
|id |creation_date |last_updated |
+----------+--------------------------+--------------------------+
|7cb15b859e|2021-11-07 08:48:25.000232|2021-11-08 08:50:35.000359|
|60ab5da73a|2022-07-02 19:48:27.000891|2022-07-03 20:05:19.000364|
|abb663a826|2015-07-12 15:35:14 |2015-08-01 15:38:07 |
|c92aaeedc1|2021-05-10 16:47:10.000455|2021-05-30 16:49:29.00063 |
+----------+--------------------------+--------------------------+
Source Schema
root
|-- id: string (nullable = true)
|-- creation_date: timestamp (nullable = true)
|-- last_updated: timestamp (nullable = true)
Hudi 0.9V Output
+-------------------+--------------------+------------------+----------------------+---------------------------------------------------------------------+----------+-------------------+-------------------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
|id |creation_date
|last_updated |
+-------------------+--------------------+------------------+----------------------+---------------------------------------------------------------------+----------+-------------------+-------------------+
|20220728035114 |20220728035114_3_2 |c92aaeedc1 |
|1736fb90-f6b2-4282-9c77-da2ace4bf0bd-0_3-10-80_20220728035114.parquet|c92aaeedc1|2021-05-10
16:47:10|2021-05-30 16:49:29|
|20220728035114 |20220728035114_1_3 |7cb15b859e |
|d650a502-386e-47b9-81f3-e72cf64b0c0e-0_1-10-78_20220728035114.parquet|7cb15b859e|2021-11-07
08:48:25|2021-11-08 08:50:35|
|20220728035114 |20220728035114_2_1 |abb663a826 |
|941ca621-111e-47d9-8ca1-bdc943490371-0_2-10-79_20220728035114.parquet|abb663a826|2015-07-12
15:35:14|2015-08-01 15:38:07|
|20220728035114 |20220728035114_0_1 |60ab5da73a |
|2d2fb872-7775-4b2d-bd28-93c289ae12c8-0_0-8-77_20220728035114.parquet
|60ab5da73a|2022-07-02 19:48:27|2022-07-03 20:05:19|
+-------------------+--------------------+------------------+----------------------+---------------------------------------------------------------------+----------+-------------------+-------------------+
Hudi 0.11V Output
+-------------------+---------------------+------------------+----------------------+-------------------------------------------------------------------------+----------+--------------------------+--------------------------+
|_hoodie_commit_time|_hoodie_commit_seqno
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
|id |creation_date
|last_updated |
+-------------------+---------------------+------------------+----------------------+-------------------------------------------------------------------------+----------+--------------------------+--------------------------+
|20220728035802662 |20220728035802662_0_1|1340225 |
|38263eea-aa5d-4adf-b7f1-f11ebd2f9142-0_0-2522-0_20220728035802662.parquet|1340225
|2017-01-24 00:02:10 |2022-02-25 07:03:54.000853|
|20220728035802662 |20220728035802662_0_2|53773de3-9 |
|38263eea-aa5d-4adf-b7f1-f11ebd2f9142-0_0-2522-0_20220728035802662.parquet|53773de3-9|2022-02-25
07:21:06.000037|2022-02-25 08:35:57.000877|
|20220728035802662 |20220728035802662_0_3|722b232f-e |
|38263eea-aa5d-4adf-b7f1-f11ebd2f9142-0_0-2522-0_20220728035802662.parquet|722b232f-e|2022-02-22
06:02:32.000481|2022-02-25 08:54:05.00042 |
+-------------------+---------------------+------------------+----------------------+-------------------------------------------------------------------------+----------+--------------------------+--------------------------+
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]