[
https://issues.apache.org/jira/browse/HUDI-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sagar Sumit updated HUDI-2971:
------------------------------
Status: Resolved (was: Patch Available)
> Timestamp values being corrupted when using BULK INSERT with row writing
> enabled
> --------------------------------------------------------------------------------
>
> Key: HUDI-2971
> URL: https://issues.apache.org/jira/browse/HUDI-2971
> Project: Apache Hudi
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Ryan Pifer
> Assignee: Sagar Sumit
> Priority: Blocker
> Fix For: 0.11.0, 0.10.1
>
>
> We found that after performing bulk inserts with data that included
> Timestamps that after performing other write operations on the table that the
> Timestamps of records from the initial load were all corrupted. We narrowed
> this down to when row writing is enabled which uses Spark Datasource V2. In
> Hudi 0.9.0 row writing is enabled by default.
> Performing 2 inserts on new table `ts_ts` match in both records (expected
> results)
> {code:java}
> scala>
> spark.read.format("hudi").load("s3://ryanpife-emr-dev/hudi/data/hudi090/timestamp/2/").show()
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+-------------------+
> |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
> _hoodie_file_name| id|version|partition| ts_string|
> ts_ts|
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+-------------------+
> | 20211022233434| 20211022233434_0_1| 101|
> 2019|0db6c29d-5291-4f7...|101| 1| 2019|2021-05-07
> 00:00:00|2021-05-07 00:00:00|
> | 20211022233556| 20211022233556_0_1| 102|
> 2019|0db6c29d-5291-4f7...|102| 2| 2019|2021-05-07
> 00:00:00|2021-05-07 00:00:00|
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+-------------------+
> {code}
>
> Performing bulk insert, then insert `ts_ts` do not match in records
> (corrupted result)
> {code:java}
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+--------------------+
> |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
> _hoodie_file_name| id|version|partition| ts_string|
> ts_ts|
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+--------------------+
> | 20211022232152| 20211022232152_0_1| 104|
> 2019|dbdc2dd9-e870-4cf...|104| 4| 2019|2021-05-07
> 00:00:00|1970-01-19 18:05:...|
> | 20211022232441| 20211022232441_0_1| 105|
> 2019|dbdc2dd9-e870-4cf...|105| 5| 2019|2021-05-07 00:00:00|
> 2021-05-07 00:00:00|
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+--------------------+{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)