Prashant Wason created HUDI-2923:
------------------------------------
Summary: Unable to read from metadata table when a compaction is
in progress or has failed
Key: HUDI-2923
URL: https://issues.apache.org/jira/browse/HUDI-2923
Project: Apache Hudi
Issue Type: Bug
Reporter: Prashant Wason
Assignee: Prashant Wason
Fix For: 0.10.0
When reading from metadata table, the [readers are opened with the latest file
slices|[https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java#L248].]
When a compaction is in progress, the latest file slice does not have any
base-file or log file (yet). Hence we are unable to read data from the metadata
table.
There are two cases here:
# Compaction eventually completes: We will be able to read data from the
metadata table.
# Compaction fails: We will not be able to read data unless the next time
compaction runs. This can be a fatal issue if the next writer tries to perform
an update which requires listing partition from the metadata table.
Relevant logs from a unit test:
13084 [main] INFO
org.apache.hudi.common.table.view.AbstractTableFileSystemView - *Pending
Compaction instant for* (FileSlice
\{fileGroupId=HoodieFileGroupId{partitionPath='files',
fileId='2733a6ef-4bfd-444d-91bd-b42c3b66a84e-0'}, baseCommitTime=002001,
baseFile='null', logFiles='[]'}) is
:Option\{val=(002001,CompactionOperation{baseInstantTime='001',
dataFileCommitTime=Option{val=001},
deltaFileNames=[.2733a6ef-4bfd-444d-91bd-b42c3b66a84e-0_001.log.1_0-34-36],
dataFileName=Option\{val=2733a6ef-4bfd-444d-91bd-b42c3b66a84e-0_0-16-20_001.hfile},
id='HoodieFileGroupId\{partitionPath='files',
fileId='2733a6ef-4bfd-444d-91bd-b42c3b66a84e-0'}', metrics={},
bootstrapFilePath=Optional.empty})}
13084 [main] INFO
org.apache.hudi.common.table.view.AbstractTableFileSystemView - File Slice
(FileSlice \{fileGroupId=HoodieFileGroupId{partitionPath='files',
fileId='2733a6ef-4bfd-444d-91bd-b42c3b66a84e-0'}, *baseCommitTime=002001,
baseFile='null', logFiles='[]'}) is in pending compaction*
13089 [main] INFO
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner - *Number of log
files scanned => 0*
13089 [main] INFO
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner -
MaxMemoryInBytes allowed for compaction => 0
13089 [main] INFO
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner - Number of
entries in MemoryBasedMap in ExternalSpillableMap => 0
13089 [main] INFO
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner - Total size in
bytes of MemoryBasedMap in ExternalSpillableMap => 0
13089 [main] INFO
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner - Number of
entries in DiskBasedMap in ExternalSpillableMap => 0
13089 [main] INFO
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner - Size of file
spilled to disk => 0
13089 [main] INFO org.apache.hudi.metadata.HoodieBackedTableMetadata -
*Opened metadata log files from []* at instant (dataset instant=002, metadata
instant=002) in 2 ms
13089 [main] INFO org.apache.hudi.metadata.HoodieBackedTableMetadata -
Metadata read for key __all_partitions__ took [baseFileRead, logMerge] [0, 0] ms
13090 [main] INFO org.apache.hudi.metadata.BaseTableMetadata - *Listed
partitions from metadata: #partitions=0*
--
This message was sent by Atlassian Jira
(v8.20.1#820001)