[jira] [Comment Edited] (HUDI-8758) hoodie.datasource.insert.dup.policy interplay with file group reader

Lin Liu (Jira) Fri, 27 Dec 2024 16:18:21 -0800


    [ 
https://issues.apache.org/jira/browse/HUDI-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17908545#comment-17908545
 ]


Lin Liu edited comment on HUDI-8758 at 12/28/24 12:17 AM:
----------------------------------------------------------

During insert operation,  there are three values for this operation:
 # fail: when a duplicate is found, throw;
 # drop: when a duplicate is found, drop the new record;
 # none: when a duplicate is found, we just treat it as new record.

So for fg reader, 

for `drop` and `fail` cases, we should not see any duplicates, the existing 
workflow should work.

for `none` case, we need to check if the table is an `insert` table, and we 
should not merge duplicate records.

 

To simplify the above logic, we just need to figure out if a table is an 
`insert` table, and do not merge records with the same key.


was (Author: JIRAUSER301185):
During insert operation,  there are three values for this operation:
 # fail: when a duplicate is found, throw;
 # drop: when a duplicate is found, drop the new record;
 # none: when a duplicate is found, we just treat it as new record.

So for fg reader, 

for `drop` and `fail` cases, we should not see any duplicates, the existing 
workflow should work.

for `none` case, we need to check if the table is an `insert` table, and we 
should not merge duplicate records.

> hoodie.datasource.insert.dup.policy interplay with file group reader
> --------------------------------------------------------------------
>
>                 Key: HUDI-8758
>                 URL: https://issues.apache.org/jira/browse/HUDI-8758
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Y Ethan Guo
>            Priority: Blocker
>             Fix For: 1.0.1
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Check hoodie.datasource.insert.dup.policy set to different values and make 
> sure fg reader can read tables generated by writes in both cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HUDI-8758) hoodie.datasource.insert.dup.policy interplay with file group reader

Reply via email to