[ https://issues.apache.org/jira/browse/HUDI-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sivabalan narayanan updated HUDI-3194: -------------------------------------- Component/s: compaction writer-core > Fix invisible writes(commits) during compaction > (HoodieParquetRealtimeInputFormat) > ---------------------------------------------------------------------------------- > > Key: HUDI-3194 > URL: https://issues.apache.org/jira/browse/HUDI-3194 > Project: Apache Hudi > Issue Type: Bug > Components: compaction, writer-core > Reporter: Yuwei Xiao > Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > Suppose a compaction (with instant A) is going on, all writes related with > the compaction (i.e., touch the file groups that are under compaction) will > end up with timestamp A. > For current `HoodieParquetRealtimeInputFormat` implementation, even the > writes complete, the records are invisible until the compaction complete. > The following pseudocode could reproduce the case > ``` > write 200 records and complete > scheduleCompaction > write 200 records and complete > read the table and only get 200 records > ``` > Note, the Spark read path is correct and will cover the corner cases during > compaction. But the hive path (also presto) is wrong. -- This message was sent by Atlassian Jira (v8.20.1#820001)