[ https://issues.apache.org/jira/browse/HUDI-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457017#comment-17457017 ]
Sagar Sumit edited comment on HUDI-2971 at 12/10/21, 10:01 AM: --------------------------------------------------------------- [~ryanpife] There was a [commit|https://github.com/apache/hudi/pull/3944/files#diff-22fb52b5cf28727ba23cb8bd4be820432a4e396ce663ac472a4677e889b7491eR543] which we [reverted|https://github.com/apache/hudi/commit/2f96f4300b37207703b477979f2461bdd294ccf9] recently. I think this issue could be related to that. What are your Hudi configs? See also https://issues.apache.org/jira/browse/HUDI-2909 was (Author: codope): [~ryanpife] There was a [commit|https://github.com/apache/hudi/pull/3944/files#diff-22fb52b5cf28727ba23cb8bd4be820432a4e396ce663ac472a4677e889b7491eR543] which we [reverted|https://github.com/apache/hudi/commit/2f96f4300b37207703b477979f2461bdd294ccf9] recently. I think this issue could be related to that. > Timestamp values being corrupted when using BULK INSERT with row writing > enabled > -------------------------------------------------------------------------------- > > Key: HUDI-2971 > URL: https://issues.apache.org/jira/browse/HUDI-2971 > Project: Apache Hudi > Issue Type: Bug > Affects Versions: 0.9.0 > Reporter: Ryan Pifer > Priority: Major > > We found that after performing bulk inserts with data that included > Timestamps that after performing other write operations on the table that the > Timestamps of records from the initial load were all corrupted. We narrowed > this down to when row writing is enabled which uses Spark Datasource V2. In > Hudi 0.9.0 row writing is enabled by default. > Performing 2 inserts on new table `ts_ts` match in both records (expected > results) > {code:java} > scala> > spark.read.format("hudi").load("s3://ryanpife-emr-dev/hudi/data/hudi090/timestamp/2/").show() > +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+-------------------+ > |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| > _hoodie_file_name| id|version|partition| ts_string| > ts_ts| > +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+-------------------+ > | 20211022233434| 20211022233434_0_1| 101| > 2019|0db6c29d-5291-4f7...|101| 1| 2019|2021-05-07 > 00:00:00|2021-05-07 00:00:00| > | 20211022233556| 20211022233556_0_1| 102| > 2019|0db6c29d-5291-4f7...|102| 2| 2019|2021-05-07 > 00:00:00|2021-05-07 00:00:00| > +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+-------------------+ > {code} > > Performing bulk insert, then insert `ts_ts` do not match in records > (corrupted result) > {code:java} > +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+--------------------+ > |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| > _hoodie_file_name| id|version|partition| ts_string| > ts_ts| > +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+--------------------+ > | 20211022232152| 20211022232152_0_1| 104| > 2019|dbdc2dd9-e870-4cf...|104| 4| 2019|2021-05-07 > 00:00:00|1970-01-19 18:05:...| > | 20211022232441| 20211022232441_0_1| 105| > 2019|dbdc2dd9-e870-4cf...|105| 5| 2019|2021-05-07 00:00:00| > 2021-05-07 00:00:00| > +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+--------------------+{code} -- This message was sent by Atlassian Jira (v8.20.1#820001)