[
https://issues.apache.org/jira/browse/HUDI-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Y Ethan Guo updated HUDI-8142:
------------------------------
Description: When multiple writers try to insert records of the same record
key that does not exsit in the table, both writers can succeed leading to
duplicates in the table. In reality, this might not happen in common use cases
(e.g., streaming CDC logs to table through Kafka). Still, if we'd like to
strictly guarantee key uniqueness for concurrent inserts, we may need the Hudi
index to be aware of multiple writers (right now the index only reads committed
data), so that we can efficiently identify the same record key across multiple
writers.
> Multi-writer unique key enforcement for OCC/NBCC
> ------------------------------------------------
>
> Key: HUDI-8142
> URL: https://issues.apache.org/jira/browse/HUDI-8142
> Project: Apache Hudi
> Issue Type: Task
> Reporter: Ethan Guo (this is the old account; please use "yihua")
> Assignee: Danny Chen
> Priority: Critical
> Fix For: 1.1.0
>
>
> When multiple writers try to insert records of the same record key that does
> not exsit in the table, both writers can succeed leading to duplicates in the
> table. In reality, this might not happen in common use cases (e.g.,
> streaming CDC logs to table through Kafka). Still, if we'd like to strictly
> guarantee key uniqueness for concurrent inserts, we may need the Hudi index
> to be aware of multiple writers (right now the index only reads committed
> data), so that we can efficiently identify the same record key across
> multiple writers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)