[ 
https://issues.apache.org/jira/browse/HUDI-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3194:
--------------------------------------
    Component/s: compaction
                 writer-core

> Fix invisible writes(commits) during compaction 
> (HoodieParquetRealtimeInputFormat)
> ----------------------------------------------------------------------------------
>
>                 Key: HUDI-3194
>                 URL: https://issues.apache.org/jira/browse/HUDI-3194
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: compaction, writer-core
>            Reporter: Yuwei Xiao
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.11.0
>
>
> Suppose a compaction (with instant A) is going on, all writes related with 
> the compaction (i.e., touch the file groups that are under compaction) will 
> end up with timestamp A.
> For current `HoodieParquetRealtimeInputFormat` implementation, even the 
> writes complete, the records are invisible until the compaction complete.
> The following pseudocode could reproduce the case
> ```
> write 200 records and complete
> scheduleCompaction
> write 200 records and complete
> read the table and only get 200 records
> ```
> Note, the Spark read path is correct and will cover the corner cases during 
> compaction. But the hive path (also presto) is wrong.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to