yanxiang created HUDI-4119:
------------------------------

             Summary: the first read result is incorrect  when Flink upsert- 
Kafka connector is used in  HUDi 
                 Key: HUDI-4119
                 URL: https://issues.apache.org/jira/browse/HUDI-4119
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: yanxiang
             Fix For: 0.11.0


 the first read result is incorrect  when Flink upsert- Kafka connector is used 
in  HUDi .
 
 ETL  path: flink upsert-kafka connector -> hudi table (MOR table,query by 
stream)
 
Here is the case:
 
1. the first time: write two records  with the same primary key into kafka, and 
 insert them into hudi table. the query result should be three records: +I 
first record, -U first record, +U second record; But the first time I query 
hudi table, I found that all the data operation were +I: +I first record,+I 
first record and +I second record, and there was no update operation; 
 Three times +I has affected hudi's subsequent ETL process-the data of  groupBy 
is inaccurate; 
2. Second time: Exit the first query, restart the query job of hudi table, and 
the query results are normal: +I first data, -U first data, +U second data.
 
Reason:
Reason:There is a bug in the program. When no data log file is generated, the 
Schema does not include the column' _ hoodie _ operation'.Please refer to the 
following link for details:
[https://www.jianshu.com/p/29f9ec5e606e]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to