aliceyyan opened a new pull request, #5626:
URL: https://github.com/apache/hudi/pull/5626

   
   ## What is the purpose of the pull request
   
   the first read result is incorrect when Flink upsert- Kafka connector is 
used in HUDi
   
   ## Brief change log
   - add tableDataExists method in class StreamerUtil;
   - modify metaClientForReader method in class StreamerUtil;
   
   ## Committer checklist
   
    - [4119] Has a corresponding JIRA in PR title & commit
    
    the first read result is incorrect  when Flink upsert- Kafka connector is 
used in  HUDi .
    
    ETL  path: flink upsert-kafka connector -> hudi table (MOR table,query by 
stream)
    
   Here is the case:
    
   1. the first time: write two records  with the same primary key into kafka, 
and  insert them into hudi table. the query result should be three records: +I 
first record, -U first record, +U second record; But the first time I query 
hudi table, I found that all the data operation were +I: +I first record,+I 
first record and +I second record, and there was no update operation; 
    Three times +I has affected hudi's subsequent ETL process-the data of  
groupBy is inaccurate; 
   2. Second time: Exit the first query, restart the query job of hudi table, 
and the query results are normal: +I first data, -U first data, +U second data.
    
   Reason:
   Reason:There is a bug in the program. When no data log file is generated, 
the Schema does not include the column' _ hoodie _ operation'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to