aliceyyan opened a new pull request, #5626:
URL: https://github.com/apache/hudi/pull/5626
## What is the purpose of the pull request
the first read result is incorrect when Flink upsert- Kafka connector is
used in HUDi
## Brief change log
- add tableDataExists method in class StreamerUtil;
- modify metaClientForReader method in class StreamerUtil;
## Committer checklist
- [4119] Has a corresponding JIRA in PR title & commit
the first read result is incorrect when Flink upsert- Kafka connector is
used in HUDi .
ETL path: flink upsert-kafka connector -> hudi table (MOR table,query by
stream)
Here is the case:
1. the first time: write two records with the same primary key into kafka,
and insert them into hudi table. the query result should be three records: +I
first record, -U first record, +U second record; But the first time I query
hudi table, I found that all the data operation were +I: +I first record,+I
first record and +I second record, and there was no update operation;
Three times +I has affected hudi's subsequent ETL process-the data of
groupBy is inaccurate;
2. Second time: Exit the first query, restart the query job of hudi table,
and the query results are normal: +I first data, -U first data, +U second data.
Reason:
Reason:There is a bug in the program. When no data log file is generated,
the Schema does not include the column' _ hoodie _ operation'
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]