[ 
https://issues.apache.org/jira/browse/HUDI-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457017#comment-17457017
 ] 

Sagar Sumit commented on HUDI-2971:
-----------------------------------

[~ryanpife] There was a 
[commit|https://github.com/apache/hudi/pull/3944/files#diff-22fb52b5cf28727ba23cb8bd4be820432a4e396ce663ac472a4677e889b7491eR543]
 which we 
[reverted|https://github.com/apache/hudi/commit/2f96f4300b37207703b477979f2461bdd294ccf9]
 recently. I think this issue could be related to that. 

> Timestamp values being corrupted when using BULK INSERT with row writing 
> enabled
> --------------------------------------------------------------------------------
>
>                 Key: HUDI-2971
>                 URL: https://issues.apache.org/jira/browse/HUDI-2971
>             Project: Apache Hudi
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Ryan Pifer
>            Priority: Major
>
> We found that after performing bulk inserts with data that included 
> Timestamps that after performing other write operations on the table that the 
> Timestamps of records from the initial load were all corrupted. We narrowed 
> this down to when row writing is enabled which uses Spark Datasource V2. In 
> Hudi 0.9.0 row writing is enabled by default.
> Performing 2 inserts on new table `ts_ts` match in both records (expected 
> results)
> {code:java}
> scala> 
> spark.read.format("hudi").load("s3://ryanpife-emr-dev/hudi/data/hudi090/timestamp/2/").show()
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+-------------------+
> |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
>    _hoodie_file_name| id|version|partition|          ts_string|              
> ts_ts|
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+-------------------+
> |     20211022233434|  20211022233434_0_1|               101|                 
>  2019|0db6c29d-5291-4f7...|101|      1|     2019|2021-05-07 
> 00:00:00|2021-05-07 00:00:00|
> |     20211022233556|  20211022233556_0_1|               102|                 
>  2019|0db6c29d-5291-4f7...|102|      2|     2019|2021-05-07 
> 00:00:00|2021-05-07 00:00:00|
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+-------------------+
> {code}
>  
> Performing bulk insert, then insert `ts_ts` do not match in records 
> (corrupted result)
> {code:java}
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+--------------------+
> |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
>    _hoodie_file_name| id|version|partition|          ts_string|               
> ts_ts|
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+--------------------+
> |     20211022232152|  20211022232152_0_1|               104|                 
>  2019|dbdc2dd9-e870-4cf...|104|      4|     2019|2021-05-07 
> 00:00:00|1970-01-19 18:05:...|
> |     20211022232441|  20211022232441_0_1|               105|                 
>  2019|dbdc2dd9-e870-4cf...|105|      5|     2019|2021-05-07 00:00:00| 
> 2021-05-07 00:00:00|
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+--------------------+{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to