[jira] [Commented] (HUDI-2751) To avoid the duplicates for streaming read MOR table

Vinoth Chandar (Jira) Wed, 17 Nov 2021 05:38:06 -0800


    [ 
https://issues.apache.org/jira/browse/HUDI-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445165#comment-17445165
 ]


Vinoth Chandar commented on HUDI-2751:
--------------------------------------

We should probably only read from logs for incremental reads? We have talked 
about compaction, retaining the original _hoodie_commit_time_ of the records in 
the log. i.e even though 100 exists, its records will have < 100 as 
"_hoodie_commit_time" and we can skip them?

> To avoid the duplicates for streaming read MOR table
> ----------------------------------------------------
>
>                 Key: HUDI-2751
>                 URL: https://issues.apache.org/jira/browse/HUDI-2751
>             Project: Apache Hudi
>          Issue Type: Sub-task
>          Components: Common Core
>            Reporter: Danny Chen
>            Priority: Major
>             Fix For: 0.11.0
>
>
> Image there are commits on the timeline:
> inflight compaction complete compaction
> | |
> {code:java}
> -----instant 99 - instant 100 ----- 101 — 102 ------ instant 100 ----------
> first read ->| second read ->|
> – range 1 ----| ----------------------range 2 -------------------|
>   {code}
> instant 99, 101, 102 are successful non-compaction delta commits;
> instant 100 is compaction instant,
> the first inc read consumes to instant 99 and the second read consumes from 
> instant 100 to instant 102, the second read would consumes the commit files 
> of instant 100 which has already been consumed before.
> The duplicate reading happens when this condition triggers: a compaction 
> instant schedules then completes in *one* consume range.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HUDI-2751) To avoid the duplicates for streaming read MOR table

Reply via email to