neerajpadarthi commented on issue #6232:
URL: https://github.com/apache/hudi/issues/6232#issuecomment-1199914603
@yihua
Hey, I have verified the same in Hudi 0.10.1 but no luck still precision is
getting truncated. Below are the configs, spark session details and spark/Hudi
outputs. Could you please verify and let me know if anything is missing here?
Thanks
===Environment Details
EMR: emr-6.6.0
Hudi version : 0.10.1
Spark version : Spark 3.2.0
Hive version : Hive 3.1.2
Hadoop version :Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no
===Spark Configs
def create_spark_session():
spark = SparkSession \
.builder \
.config(“spark.sql.extensions”,
“org.apache.spark.sql.hudi.HoodieSparkSessionExtension”) \
.config(“spark.sql.parquet.writeLegacyFormat”, “true”) \
.config(“spark.sql.parquet.outputTimestampType”, “TIMESTAMP_MICROS”) \
.config(“spark.sql.legacy.parquet.datetimeRebaseModeInRead”, “LEGACY”)\
.config(“spark.sql.legacy.parquet.int96RebaseModeInRead”,“LEGACY”)\
.enableHiveSupport()\
.getOrCreate()
return spark
===Hudi Configs
db_name = <>
tableName = <>
pk =<>
de_dup =<>
commonConfig = {‘hoodie.datasource.hive_sync.database’:
db_name,‘hoodie.table.name’:
tableName,‘hoodie.datasource.hive_sync.support_timestamp’:
‘true’,‘hoodie.datasource.write.recordkey.field’:
pk,‘hoodie.datasource.write.precombine.field’:
de_dup,‘hoodie.datasource.hive_sync.enable’:
‘true’,‘hoodie.datasource.hive_sync.table’: tableName}
nonPartitionConfig =
{‘hoodie.datasource.hive_sync.partition_extractor_class’:‘org.apache.hudi.hive.NonPartitionedExtractor’,‘hoodie.datasource.write.keygenerator.class’:‘org.apache.hudi.keygen.NonpartitionedKeyGenerator’}
config = {‘hoodie.bulkinsert.shuffle.parallelism’:
10,‘hoodie.datasource.write.operation’:
‘bulk_insert’,‘hoodie.parquet.outputtimestamptype’:‘TIMESTAMP_MICROS’,
#‘hoodie.datasource.write.row.writer.enable’:’false’}
===Spark DF Output
+----------+--------------------------+--------------------------+
|id |creation_date |last_updated |
+----------+--------------------------+--------------------------+
|1340225 |2017-01-24 00:02:10 |2022-02-25 07:03:54.000853|
|722b232f-e|2022-02-22 06:02:32.000481|2022-02-25 08:54:05.00042 |
|53773de3-9|2022-02-25 07:21:06.000037|2022-02-25 08:35:57.000877|
+----------+--------------------------+--------------------------+
===Hudi V0.10.1 Output
+-------------------+---------------------+------------------+----------------------+------------------------------------------------------------------------+----------+-------------------+-------------------+
|_hoodie_commit_time|_hoodie_commit_seqno
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
|id |creation_date |last_updated
|
+-------------------+---------------------+------------------+----------------------+------------------------------------------------------------------------+----------+-------------------+-------------------+
|20220729201157281 |20220729201157281_1_2|53773de3-9 |
|55f7c820-c289-4eb7-aabc-4f079bd44536-0_1-11-10_20220729201157281.parquet|53773de3-9|2022-02-25
07:21:06|2022-02-25 08:35:57|
|20220729201157281 |20220729201157281_2_3|722b232f-e |
|0dd8d6c2-9d64-40d7-a4db-bf7cf95bd02c-0_2-11-11_20220729201157281.parquet|722b232f-e|2022-02-22
06:02:32|2022-02-25 08:54:05|
|20220729201157281 |20220729201157281_0_1|1340225 |
|2e0cf27b-999d-4d5e-9c4e-52d27c25294e-0_0-9-9_20220729201157281.parquet
|1340225 |2017-01-24 00:02:10|2022-02-25 07:03:54|
+-------------------+---------------------+------------------+----------------------+------------------------------------------------------------------------+----------+-------------------+-------------------+
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]