[
https://issues.apache.org/jira/browse/HUDI-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17908545#comment-17908545
]
Lin Liu edited comment on HUDI-8758 at 12/28/24 12:17 AM:
----------------------------------------------------------
During insert operation, there are three values for this operation:
# fail: when a duplicate is found, throw;
# drop: when a duplicate is found, drop the new record;
# none: when a duplicate is found, we just treat it as new record.
So for fg reader,
for `drop` and `fail` cases, we should not see any duplicates, the existing
workflow should work.
for `none` case, we need to check if the table is an `insert` table, and we
should not merge duplicate records.
To simplify the above logic, we just need to figure out if a table is an
`insert` table, and do not merge records with the same key.
was (Author: JIRAUSER301185):
During insert operation, there are three values for this operation:
# fail: when a duplicate is found, throw;
# drop: when a duplicate is found, drop the new record;
# none: when a duplicate is found, we just treat it as new record.
So for fg reader,
for `drop` and `fail` cases, we should not see any duplicates, the existing
workflow should work.
for `none` case, we need to check if the table is an `insert` table, and we
should not merge duplicate records.
> hoodie.datasource.insert.dup.policy interplay with file group reader
> --------------------------------------------------------------------
>
> Key: HUDI-8758
> URL: https://issues.apache.org/jira/browse/HUDI-8758
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Y Ethan Guo
> Priority: Blocker
> Fix For: 1.0.1
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> Check hoodie.datasource.insert.dup.policy set to different values and make
> sure fg reader can read tables generated by writes in both cases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)